compositional scene representation is the process of trying to represent a certain visual signal into its constituent parts.
Aim: unsupervised segmentation + representation
- the model finds the most intuitive representations of the scene
- train segmentation and representation together
Autoencoding segmentation! Segment => Represent => Resegment => etc.
Gaussian Mixture Model???? over pixels: regularizes by taking KL Divergence between latent and predicted output, to force them to be similar.
Loss: error in RECONSTRUCTION and KL-Divergence of latent space