Posts

SU-CS120 OCT082024

Last edited: August 8, 2025
  • evaluation is quite hard—you need

Classical Test Theory

  • “just average each test” (think MUC, b3, etc.)
  • test-dependent ability estimation
  • BAD: because each test maybe different difficulty

Item Response Theory (IRT)

  • model item and test taker characteristics
  • test-invariant ability estimation (subset invariant)
  • adaptive testing

problem

  • requires calibration first
  • …which is quite costly

Flash-HELM

HELM, prioritizing higher-ranked models. Evaluate good model more.

Sang’s Method

We want to estimate \(\theta\) with a budget of \(K\) questions.

SU-CS120 SEP242024

Last edited: August 8, 2025

AI Safety

Safety as a property is dependent on the being used. Meaning: “AI safety focuses on technical solutions to ensure that AI systems operate safely and reliably.”

  • preventing accidents, misuse, and harmful consiquences
  • machine ethics and AI alignment
  • monitoring systems for risks
  • developing norms and policies that promote safety

a divide in safety research

people either research…

  • AI is going to come and kill us all
  • AI is going to introduce and exacerbate bias

Progression in AI, back in the day

  • AlexNet
  • AlphaGo vs. AlphaFold

SU-CS143 APR012025

Last edited: August 8, 2025

its compilers time!

digraph {
rankdir=LR;
graph [bgcolor=transparent];
node [fontcolor=white, color=white];
edge [fontcolor=white, color=white];

program -> compiler -> "binary code";
}

a bit of history

manual punch cards — slow to write

speedcoding

  • 10-20 times slower than hand written assembly
  • interpreter!

…nobody used it

Fortran I

John Backus

  • development time halved
  • performance is close to hand-written assembly (80%!)

Key automation: you had to manage the finite number of registers in hand-writing assembly, but Fortran would fix that for you.

SU-CS143 APR032025

Last edited: August 8, 2025

how to design a language: “why don’t we just make a truing machine?” tl;dr: “writing code in a Turing Machine takes a while, writing it in C++ takes a little less while.”

  • languages fills a void: makes something previously difficult/impossible easy
  • good languages vs. language design needs are orthogonal things

why do we not change languages?

  • rewriting code is hard
  • languages with many users are replaced rarely — popular languages are ossified
  • so people just go start new niches

language vs. ends

  • SQL: query optimizations by separating data query vs. access pattern (inserting indexes w/o rewrite code)
  • Python: library composition / FFI
  • Haskell: proofs and type safety
  • Rust: security

language design

  • no universally accepted metrics for design
  • claim: “a good language is the one that people use” (“I don’t really buy that, because otherwise PHP would be the best language” - Fred 2025)

abstraction

abstraction — detaching high level problems from functional details, “selective ignorance”.

SU-CS143 APR082025

Last edited: August 8, 2025

Lexer

Goal: identify tokens in the input string. Its a lot of regular expressions and DFAing.

Example

Consider:

if (i == j)
    z = 0;
else
    z = 1;

We want a linear algorithm for lexing, because quadratic algorithms are slow. The gaol here is to partition the input string into substrings.

Let’s make a Lexer!

  1. identify token classes
  2. describe which strings belong to each token

token classes

token classes define all items of interest; this is dependent on the choice of language and the design of the parser.