SU-CS161 OCT282025
Last edited: October 10, 2025Key Sequence
Notation
New Concepts
Important Results / Claims
Questions
Interesting Factoids
SU-CS224N APR092024
Last edited: October 10, 2025Neural Networks are powerful because of self organization of the intermediate levels.
Neural Network Layer
\begin{equation} z = Wx + b \end{equation}
for the output, and the activations:
\begin{equation} a = f(z) \end{equation}
where the activation function \(f\) is applied element-wise.
Why are NNs Non-Linear?
- there’s no representational power with multiple linear (though, there is better learning/convergence properties even with big linear networks!)
- most things are non-linear!
Activation Function
We want non-linear and non-threshold (0/1) activation functions because it has a slope—meaning we can perform gradient-based learning.
SU-CS229 OCT272025
Last edited: October 10, 2025topological sort
Last edited: October 10, 2025For directed acyclic graphs, a topological sort of a directed graph is such that if there’s an edge \(A \to B\), then \(A\) comes before \(B\) in the sort (i.e. there’s not an edge from \(B\) to \(A\)). Under direct acyclic graphs, a topological sort always exist.
solving topological sort with depth first search
In a DAG, you can always go from larger finish times to smaller finish times in depth first search to be able to get a topological sort.
Decision Tree
Last edited: October 10, 2025Let’s consider greedy Decision Tree learning.
greedy procedure
- initial tree—no split: always predict the majority class \(\hat{y} = \text{maj}\qty(y), \forall x\)
- for each feature \(h\qty(x)\)
- split data according to feature
- compute classification error of the split
- choose \(h^{*}\qty(x)\) with the lowest error after splitting
- loop until stop
stopping criteria
- each node agrees on \(y\) (the tree fits data exactly)
- exhausted on all features (nothing to split on)
additional information
threshold splitting
We are going to perform what’s called a “threshold split.” Choose thresholds between two points as the “split values” to check. Now, how do we deal with splitting twice? We can until we get bored or we over fit.
