VFUA
Last edited: August 8, 2025this is worse
VGG
Last edited: August 8, 2025VGGish
Last edited: August 8, 2025VGGish is VGG, ish. VGGish is a network based on VGG which is pretrained on the audio-feature-extraction task.

Video Generation with Learned Priors
Last edited: August 8, 2025visual prediction task
Given \(n\) frames of video \(x_{1}, …, x_{n}\), predict \(T\) subsequent frames \(x_{n+1}… x_{n+T}\).
Action Conditioned Prediction Network
Visual prediction task. Two inputs:
- taken action
- image
Predict the next \(t\) frame. Challenge! What is the action? Predicting the action is not super easy.
RAFI
Conditional Flow Matching with Video Generation.
