Speech Translation with prosody modification

Translating speechs to target language without loss of prosody

Existing speech translation systems will seamlessly convey the context but they fail in incorporating prosody or emotions into target audio.

English audio
Translated to Telugu
Hindi audio
Translated to Telugu

We extract the subtitles from the audio and \(f_0\) later we use pretrained neural vocoder with modified \(f_0\) and mel spectrogram to get targeted language audio.