Posts

Video Generation with Learned Priors

Last edited: August 8, 2025

visual prediction task

Given \(n\) frames of video \(x_{1}, …, x_{n}\), predict \(T\) subsequent frames \(x_{n+1}… x_{n+T}\).

Action Conditioned Prediction Network

Visual prediction task. Two inputs:

  1. taken action
  2. image

Predict the next \(t\) frame. Challenge! What is the action? Predicting the action is not super easy.

RAFI

Conditional Flow Matching with Video Generation.

https://arxiv.org/pdf/2406.14436

Vietnam

Last edited: August 8, 2025

vietnamization

Last edited: August 8, 2025

vietnamization is a political position held by Richard Nixon which is characterized by the slow replacement of American troops with Vietnamese ones.

virtual memory

Last edited: August 8, 2025

We are trying to share a resource: memory; memory allows multiple processes to use a share pool of memory.

key goals

  • multitasking: multiple processes should be able to use memory
  • transparency: no process need to know that memory is shared; each process should be able to run regardless of the number/locations of processes
  • isolation: processes shouldn’t be able to corrupt other processes’ memory
  • efficiency: shouldn’t be degraded by sharing

virtual memory

The operating system will translate virtual addresses (which are 0 based for every program, which isn’t a problem) to physical addresses in memory.

VLM to Agents

Last edited: August 8, 2025

A talk by Tao Yu.

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids

  • funny: based on Computer Agent Arena results, Claude Computer Use scores lower than normal Claude because it appears that Claude Computer Use over-fitted to Ubuntu