Prof. Xin Liu
Last edited: August 8, 2025programatically compiling RegEx to DFA
Last edited: August 8, 2025A high level sketch:
Another High Level Sketch
Step 1: Write some RegExps
Do that.
Step 2: Construct R
For regular expressions you have defined for keywords, identifies, numbers, etc… We want to construct an uber union regular expression:
\begin{align} R &= \text{Keyword} + \text{Identifier} + \text{Number} + \dots \\ &= R_1 | R_2 | R_3 | \dots \end{align}
Step 3: Tokenization
For input \(x_1, …, x_{n}\), for \(i \in 1 …n\) inclusive, we check:
Project Proposal: Lookahead Sampler
Last edited: August 8, 2025Introduction
Recent advances of language models (LMs) introduced the possibility of in-context, few or zero-shot reasoning (Brown et al. 2020) using LMs without much or any fine tuning.
Yet, classically, LM decoding takes place in a left-to-right fashion, auto-regressively resolving one token at a time by sampling from the output distribution of possible next words without multi-step planning.
Work in LM agents have taken steps to solve more complex problems that would typically require multi-step reasoning even while using this direct decoding approach. The simplest idea, named “chain-of-thoughts” (CoT), involves forcing the LM at decode time to begin the decoding process with natural language reasoning about its actions (Wei et al. 2022). The method has contributed to the creation of powerful language agents (Yao, Zhao, et al. 2023) that can reason about complex actions.
Project80
Last edited: August 8, 2025Project80 is a podcast hosted by Houjun Liu, Anoushka Krishnan, Micah Brown, Mia Tavares, among others.
College Application w.r.t. Project80
Cheese mission statement: Project80 is a good way of creating a self-propegating set of learning that would serve to benefit and educate future generations in hopes of creating a more equitable planet.
Project80 Abstract
Last edited: August 8, 2025Natural science education resources traditionally teach only codified theory. While theory education is crucial, much of academic science takes place via scrutinizing contested scientific discourse. Due to such resources’ content complexity, high school students are rarely exposed to current, debatable, and relevant science. In response, we introduce Project80: a systemic, student-run protocol to synthesize the latest primary literature in a sub-field into approachable, produced multimedia educational content. The protocol is run by a team of 7 students over the course of 1 month. Students running the protocol consume complex scientific literature, distill relevant data and findings, and synthesize a culminating product of audiovisual content to supplement existing biology and chemistry pedagogy. The system runs independently with limited faculty involvement. Our analysis indicates that the multimedia content created by this protocol will be relevant to roughly 30 courses locally at our institution and will have further extensions in secondary education beyond.