Speech Translation with prosody modification
Translating speechs to target language without loss of prosody
Existing speech translation systems will seamlessly convey the context but they fail in incorporating prosody or emotions into target audio.
We extract the subtitles from the audio and \(f_0\) later we use pretrained neural vocoder with modified \(f_0\) and mel spectrogram to get targeted language audio.