Speech Translation with prosody modification

Existing speech translation systems will seamlessly convey the context but they fail in incorporating prosody or emotions into target audio.

English audio

Translated to Telugu

Hindi audio

Translated to Telugu

We extract the subtitles from the audio and \(f_0\) later we use pretrained neural vocoder with modified \(f_0\) and mel spectrogram to get targeted language audio.