Disclaimer: This is a third-party implementation.
In summary, MelGAN can convert mel-spectrograms into raw audio at real-time on CPU, and it generalizes to unseen speakers with significantly fewer parameters than previous state-of-the-art, WaveGlow.
All audios below are unseen during training. We split LJSpeech-1.1 into 9:1 for train/validation. (Files with suffix "*5.wav" are for validation)
All details are shown in GitHub repository's README. Thank you!