In our latest paper we shoa that a GAN loss (used by almost all latent diffusion models) to train their autoencoders is not required and instead can be replaced with a diffusion loss. Our auto-encoder is trained end-to-end and achieves higher compression and better generation quality.
I am excited to share it with you. Let me know what you think.
This is very interesting. Unlike us (who focus on the decoder) they focus on changing the representation itself so that they can achieve better generation. Thanks for the link.
Hi
In our latest paper we shoa that a GAN loss (used by almost all latent diffusion models) to train their autoencoders is not required and instead can be replaced with a diffusion loss. Our auto-encoder is trained end-to-end and achieves higher compression and better generation quality.
I am excited to share it with you. Let me know what you think.
Cheers
I just saw https://hanlab.mit.edu/projects/hart
it seems to be another autoencoder(autoregressive) + diffusion.
This is very interesting. Unlike us (who focus on the decoder) they focus on changing the representation itself so that they can achieve better generation. Thanks for the link.
they use autoencoder/autoregressive model to predict the big picture, and diffusion for the details, similar to yours.
The difference is they use discrete tokens.