Video Generation with Learned Priors
Last edited: August 8, 2025visual prediction task
Given \(n\) frames of video \(x_{1}, …, x_{n}\), predict \(T\) subsequent frames \(x_{n+1}… x_{n+T}\).
Action Conditioned Prediction Network
Visual prediction task. Two inputs:
- taken action
- image
Predict the next \(t\) frame. Challenge! What is the action? Predicting the action is not super easy.
RAFI
Conditional Flow Matching with Video Generation.
Vietnam
Last edited: August 8, 2025vietnamization
Last edited: August 8, 2025vietnamization is a political position held by Richard Nixon which is characterized by the slow replacement of American troops with Vietnamese ones.
virtual memory
Last edited: August 8, 2025We are trying to share a resource: memory; memory allows multiple processes to use a share pool of memory.
key goals
- multitasking: multiple processes should be able to use memory
- transparency: no process need to know that memory is shared; each process should be able to run regardless of the number/locations of processes
- isolation: processes shouldn’t be able to corrupt other processes’ memory
- efficiency: shouldn’t be degraded by sharing
virtual memory
The operating system will translate virtual addresses (which are 0 based for every program, which isn’t a problem) to physical addresses in memory.
VLM to Agents
Last edited: August 8, 2025A talk by Tao Yu.
Notation
New Concepts
Important Results / Claims
- Challenges of Agent Data Collection
- Challenges of Agent Benchmarking
- human agent interaction collection procedure
Questions
Interesting Factoids
- funny: based on Computer Agent Arena results, Claude Computer Use scores lower than normal Claude because it appears that Claude Computer Use over-fitted to Ubuntu