Disclaimer: This is a third-party implementation.
In summary, MelGAN can convert mel-spectrograms into raw audio at real-time on CPU, and it generalizes to unseen speakers with significantly fewer parameters than previous state-of-the-art, WaveGlow.
All audios below are unseen during training. We split LJSpeech-1.1 into 9:1 for train/validation. (Files with suffix "*5.wav" are for validation)
Epochs | LJ001-0005.wav | LJ001-0015.wav | LJ014-0285.wav |
---|---|---|---|
Original | |||
Epoch 400 | |||
Epoch 800 | |||
Epoch 1600 | |||
Epoch 3200 | |||
Epoch 6400 |
All details are shown in GitHub repository's README. Thank you!
Implementation author: Seungwon Park, Myunchul Joe @ MINDsLab | Rishikesh @ DeepSync Technologies Pvt Ltd.