Houjun Liu

ICLR2025 Yue: Inference Scaling for Long-Context RAG

“RAG performance can scale almost linearly w.r.t. log inference FLOPs”

Demonstration Based RAG (DRAG)

Method

Adding demonstrations as k in-context examples.

Prompt: documents, input query, final answer.

Parameters: number of documents, number of in context samples, number of iterations upper bound.

Iterative Demonstration Based RAG (IterDRAG)

Method

DRAG above, and then the model can generate a new sub-query. The model decides

Parameters: number of documents, number of in context samples, number of iterations upper bound.