SU-CS120 OCT082024
Last edited: August 8, 2025- evaluation is quite hard—you need
Classical Test Theory
- “just average each test” (think MUC, b3, etc.)
- test-dependent ability estimation
- BAD: because each test maybe different difficulty
Item Response Theory (IRT)
- model item and test taker characteristics
- test-invariant ability estimation (subset invariant)
- adaptive testing
problem
- requires calibration first
- …which is quite costly
Flash-HELM
HELM, prioritizing higher-ranked models. Evaluate good model more.
Sang’s Method
We want to estimate \(\theta\) with a budget of \(K\) questions.
SU-CS120 SEP242024
Last edited: August 8, 2025AI Safety
Safety as a property is dependent on the being used. Meaning: “AI safety focuses on technical solutions to ensure that AI systems operate safely and reliably.”
- preventing accidents, misuse, and harmful consiquences
- machine ethics and AI alignment
- monitoring systems for risks
- developing norms and policies that promote safety
a divide in safety research
people either research…
- AI is going to come and kill us all
- AI is going to introduce and exacerbate bias
Progression in AI, back in the day
- AlexNet
- AlphaGo vs. AlphaFold
SU-CS143 APR012025
Last edited: August 8, 2025its compilers time!
digraph {
rankdir=LR;
graph [bgcolor=transparent];
node [fontcolor=white, color=white];
edge [fontcolor=white, color=white];
program -> compiler -> "binary code";
}
a bit of history
manual punch cards — slow to write
speedcoding
- 10-20 times slower than hand written assembly
- interpreter!
…nobody used it
Fortran I
John Backus
- development time halved
- performance is close to hand-written assembly (80%!)
Key automation: you had to manage the finite number of registers in hand-writing assembly, but Fortran would fix that for you.
SU-CS143 APR032025
Last edited: August 8, 2025how to design a language: “why don’t we just make a truing machine?” tl;dr: “writing code in a Turing Machine takes a while, writing it in C++ takes a little less while.”
- languages fills a void: makes something previously difficult/impossible easy
- good languages vs. language design needs are orthogonal things
why do we not change languages?
- rewriting code is hard
- languages with many users are replaced rarely — popular languages are ossified
- so people just go start new niches
language vs. ends
- SQL: query optimizations by separating data query vs. access pattern (inserting indexes w/o rewrite code)
- Python: library composition / FFI
- Haskell: proofs and type safety
- Rust: security
language design
- no universally accepted metrics for design
- claim: “a good language is the one that people use” (“I don’t really buy that, because otherwise PHP would be the best language” - Fred 2025)
abstraction
abstraction — detaching high level problems from functional details, “selective ignorance”.
SU-CS143 APR082025
Last edited: August 8, 2025Lexer
Goal: identify tokens in the input string. Its a lot of regular expressions and DFAing.
Example
Consider:
if (i == j)
z = 0;
else
z = 1;
We want a linear algorithm for lexing, because quadratic algorithms are slow. The gaol here is to partition the input string into substrings.
Let’s make a Lexer!
- identify token classes
- describe which strings belong to each token
token classes
token classes define all items of interest; this is dependent on the choice of language and the design of the parser.