Posts

Model Evaluation

Last edited: October 10, 2025

Some ideas of model validation

Cross Validation

Hold-out cross-validation

For instance, you can do:

  • 70% for training
  • 30% hold out cross validation for testing

But at very large dataset scales, the validation size can be capped at a fixed size (so you can hold out like 0.1% or something but still have 10k samples).

k-fold cross validation

  1. shuffle the data
  2. divide the data into \(k\) equal sized pieces
  3. repeatedly train the algorithm on 4/5 of the data, test on remaining 1/5

In practice people do 10 folds.

Model Selection

Last edited: October 10, 2025

Model selection:

A special case of model selection is feature selection:

  • choose a subset of the most relevant features to train on
  • note that power set is \(2^{m}\) in size; so instead of doing this we train \(O\qty(n)\) by starting out with an empty set, and then adding features sequentially that would give us the best performance

overfitting

Last edited: October 10, 2025

consider something like a polynomial interpolation:

Interpolating polynomial (or most ML models in general) are smooth, and so interpolating between points will result in “overshooting” regional points and “bouncing around”

…as a function of parameters

At a fixed dataset, just increasing the number of parameters will increase

SU-CS229 OCT062025

Last edited: October 10, 2025

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids

Decisions.jl

Last edited: October 10, 2025

A general formulation of decision networks; track:

  1. track how distributions affect each other (i.e. generalize your problem into a Bayes Net)
  2. apply transformation of the edges to your new problem type structure
  3. transform back into your new formulation’s solver
  4. solve