NVAE: A Leap Forward

In this post, I’ll analyze Nouveau VAE (NVAE) paper by Arash Vahdat and Jan Kautz. It is a Neural Information Processing Systems (NeurIPS) 2020 (spotlight) paper, you can access the paper from this link, and the source code is available.


Nouveau Variational Autoencoder (NVAE) is the first VAE that is capable of generating high-quality images (up to 256 x 256, which is very good) with the same VAE objective function (KL). The main contribution of this paper is making use of residual conditioned distributions. Conditioned by the previously calculated posteriors which are used as a prior in the next, they sample a new posterior at each level. Moreover, the model is still fast in the generation of images and stable during the training.

Main takeaways:

  • They make VAE a competitive model by carefully designing it. They performed extensive experiments (on 4 datasets: MNIST, CIFAR-10, FFHQ, Celeb A and ImageNet) and ablated their improvements one by one. NVAE outperforms the state-of-the-art non-autoregressive flow and VAE models except in ImageNet. On the other hand, it is the first model VAE that is trained on FFHQ.
  • They added depth wise separable convolutions to increase the receptive field. Multi-scale helps.
  • They introduced batch-norm, Swish activation function and squeeze excitation in each residual block to further boost the performance, proven by the experiments.
  • To make KL loss bounded, they introduced a new residual parametrization spectral regularization to make the training robust.
  • They authors included a stability trick which is called as spectral norm regularization. In short, the aim is to regularize Lipschitz at each layer.

Shortcomings and limitations:

  • Even though the competitive performance, they are not comparing their model to GAN models since there is still a gap. In the related work part, they even don’t mention adversarial training.
  • On the other hand, the work is done on the top of Inverse Autoregressive Flows (IAFs) but the differences can be given with more detail.
  • Moreover, even though their engineering effort deserves admiration, their model is susceptible to small changes in parameters.


  • Even though, generated faces are crisp, the texture on the generated faces are too smooth. There are no wrinkles on the faces. Why is that?
  • What can be done to further improve the NVAE? Is this the limit of VAE? We don’t know it, yet.
  • Should we count using mixed-precision as a contribution? I think it’s a NVIDIA APEX library ad 🙂

Leave a Comment