1. Speech Editing with Multi-Speaker Dataset (For comparison with Tan el.al)

Here we compare our speech editing system directly with Tan el.al on VCTK dataset. The decoding process is shown in the right figure.

Untitled

  1. P243_313

original text: for that reason cover should not be given

original audio

original audio

target text: for that reason cover is impossible to be given

Tan el.al 2021

Tan el.al 2021

our $\text{A}^3\text{T}$

our $\text{A}^3\text{T}$

target text: for that theoretical and realistic reason cover should not be given

Tan el.al 2021

Tan el.al 2021

our $\text{A}^3\text{T}$

our $\text{A}^3\text{T}$

  1. p260_013

original text: some have accepted it as a miracle without physical explanation

original audio

original audio

target text: some have accepted it as an undeniable fact without physical explanation

Tan el.al 2021

Tan el.al 2021

our $\text{A}^3\text{T}$

our $\text{A}^3\text{T}$

target text: some have accepted it as a miracle never seen before without physical explanation

Tan el.al 2021

Tan el.al 2021

our $\text{A}^3\text{T}$

our $\text{A}^3\text{T}$

  1. p323_384

original text: the idea has potential for the future.