Audio samples from MelGAN vocoder

Disclaimer: This is a third-party implementation.

In summary, MelGAN can convert mel-spectrograms into raw audio at real-time on CPU, and it generalizes to unseen speakers with significantly fewer parameters than previous state-of-the-art, WaveGlow.

LJSpeech-1.1 (Updated 2019.12.02)

All audios below are unseen during training. We split LJSpeech-1.1 into 9:1 for train/validation. (Files with suffix "*5.wav" are for validation)

Epochs LJ001-0005.wav LJ001-0015.wav LJ014-0285.wav
Original
Epoch 400
Epoch 800
Epoch 1600
Epoch 3200
Epoch 6400

All details are shown in GitHub repository's README. Thank you!

Implementation author: Seungwon Park, Myunchul Joe @ MINDsLab | Rishikesh @ DeepSync Technologies Pvt Ltd.