IEEE SP CUP 2022
Synthetic Speech Attribuition
We participated this challenge conducted by IEEE SPS, and we were awarded First-runner up at ICASSP 2022, Singapore.
The challenge was to classify the algorithm from which the fake signal was generated, unfortunately were are given the TTS characteristics rather only labels were provided.
We looked for pitfalls of TTS systems and found out that Linear Predictive Residuals or Voice source features are discriminative among different systems, and compared to Natural speech, fake speeches tend to have LP Residual which is more periodic or predicitve.
Most of the TTS systems are more focused on mimicing Mel Spectrograms which are Vocal tract features, leaving behind the artefacts at source.


We fused both voice source features (LP Residuals) and vocal track features (Mel Spectrogram) and built a custom feature extractor and X-Vectors system to classify the speech signals.

We obtained both 99% on training data and validation data. We extended our study to look at the attention of network and see the artefacts being exploited, We found that the network is attentive more at the regions of phoneme transistions a.k.a coarticulations.
