Random thoughts as I scan through the book:
Central framing: learning as a means to generalize + predict
Key Tasks
- regression (predict a value)
- binary classification (sort an example into a boolean class of Y/N)
- multi-class classification (sort an example into multiple classes)
- ranking (sorting an object into relevant order)
Optimality of Bayes Optimal Classifier
If you have the underlying data distribution, we can classify \(y\) from \(x\) by choosing:
\begin{equation} f(x) = \arg\max_{y \in Y} P(x, y) \end{equation}
where \(P(a,b)\) is the probability of \(a,b\) pain on the data \(D\); but like you don’t have the probability distributed over \(D\), so sadge.
Optimality:
assume for contradiction \(\exists\) some \(g\) which works better than \(f\), meaning \(\exists x: g(x) \neq f(x)\). The error rate of \(f\) on \(x\) is \(1-P(x, f(x))\), of \(g\) is \(1-P(x, g(x))\). Yet, \(P(x,f(x))\) is maximized by definition of \(f\), so \(1-P(x, f(x)) \leq 1-P(x,g(x))\), meaning, \(P(x,g(x)) \leq P(x,f(x))\); recall \(P\) is the distribution on actual data so \(g\) is worse than \(f\), reaching contradiction. \(\blacksquare\)
Decision Tree
Bias
Inductive Bias
Inductive Bias is defined here as the bias towards a particular choice of solution in a set of possible variations of valid solutions in absence of data which further narrows down the relevant concept.
Nice.
Confused about their treatment of parity, will come back later.
So true

so true
Sooo true

I officially have no opinions on their treatment of KNN/kmeans or single layer perceptrons
I wish there was a proof for why kmeans works
Perception
See logistic regression and Neural Networks
Perception Convergence
Suppose \(D\) is linearly separable with margin \(\gamma >0\) (otherwise it wouldn’t be very separable); assume \(\mid x \mid \leq 1\); then, 1-layer perceptrons converge in \(\frac{1}{\gamma^{2}}\) updates.
Sketch: take the fact that \(w^{(k)} = w^{(k-1)} + yx\). Compare it to \(w^{*}\) to obtain that \(w^{*} \cdot w^{(k)} \geq k \gamma\). Norm it and because \(x\) is within \(1\) so \(\mid w^{(k)}\mid^{2} \leq k\). Now do algebra.
We will obtain \(k \leq \frac{1}{\gamma^{2}}\).
