Posts

VFUA

Last edited: August 8, 2025

this is worse

VGG

Last edited: August 8, 2025

VGGish

Last edited: August 8, 2025

VGGish is VGG, ish. VGGish is a network based on VGG which is pretrained on the audio-feature-extraction task.

Video Generation with Learned Priors

Last edited: August 8, 2025

visual prediction task

Given \(n\) frames of video \(x_{1}, …, x_{n}\), predict \(T\) subsequent frames \(x_{n+1}… x_{n+T}\).

Action Conditioned Prediction Network

Visual prediction task. Two inputs:

  1. taken action
  2. image

Predict the next \(t\) frame. Challenge! What is the action? Predicting the action is not super easy.

RAFI

Conditional Flow Matching with Video Generation.

https://arxiv.org/pdf/2406.14436

Vietnam

Last edited: August 8, 2025