Basics of ML for 224n

Random thoughts as I scan through the book:

Central framing: learning as a means to generalize + predict

Key Tasks

regression (predict a value)
binary classification (sort an example into a boolean class of Y/N)
multi-class classification (sort an example into multiple classes)
ranking (sorting an object into relevant order)

Optimality of Bayes Optimal Classifier

If you have the underlying data distribution, we can classify \(y\) from \(x\) by choosing:

\begin{equation} f(x) = \arg\max_{y \in Y} P(x, y) \end{equation}

where \(P(a,b)\) is the probability of \(a,b\) pain on the data \(D\); but like you don’t have the probability distributed over \(D\), so sadge.

Optimality:

assume for contradiction \(\exists\) some \(g\) which works better than \(f\), meaning \(\exists x: g(x) \neq f(x)\). The error rate of \(f\) on \(x\) is \(1-P(x, f(x))\), of \(g\) is \(1-P(x, g(x))\). Yet, \(P(x,f(x))\) is maximized by definition of \(f\), so \(1-P(x, f(x)) \leq 1-P(x,g(x))\), meaning, \(P(x,g(x)) \leq P(x,f(x))\); recall \(P\) is the distribution on actual data so \(g\) is worse than \(f\), reaching contradiction. \(\blacksquare\)

Decision Tree

Bias

Inductive Bias

Inductive Bias is defined here as the bias towards a particular choice of solution in a set of possible variations of valid solutions in absence of data which further narrows down the relevant concept.

Nice.

Confused about their treatment of parity, will come back later.

So true

so true

Sooo true

I officially have no opinions on their treatment of KNN/kmeans or single layer perceptrons

I wish there was a proof for why kmeans works

Perception

See logistic regression and Neural Networks

Perception Convergence

Suppose \(D\) is linearly separable with margin \(\gamma >0\) (otherwise it wouldn’t be very separable); assume \(\mid x \mid \leq 1\); then, 1-layer perceptrons converge in \(\frac{1}{\gamma^{2}}\) updates.

Sketch: take the fact that \(w^{(k)} = w^{(k-1)} + yx\). Compare it to \(w^{*}\) to obtain that \(w^{*} \cdot w^{(k)} \geq k \gamma\). Norm it and because \(x\) is within \(1\) so \(\mid w^{(k)}\mid^{2} \leq k\). Now do algebra.

We will obtain \(k \leq \frac{1}{\gamma^{2}}\).