Posts

SU-CS161 OCT282025

Last edited: October 10, 2025

Key Sequence

Notation

New Concepts

Important Results / Claims

Questions

Interesting Factoids

SU-CS224N APR092024

Last edited: October 10, 2025

Neural Networks are powerful because of self organization of the intermediate levels.

Neural Network Layer

\begin{equation} z = Wx + b \end{equation}

for the output, and the activations:

\begin{equation} a = f(z) \end{equation}

where the activation function \(f\) is applied element-wise.

Why are NNs Non-Linear?

  1. there’s no representational power with multiple linear (though, there is better learning/convergence properties even with big linear networks!)
  2. most things are non-linear!

Activation Function

We want non-linear and non-threshold (0/1) activation functions because it has a slope—meaning we can perform gradient-based learning.

SU-CS229 OCT272025

Last edited: October 10, 2025

topological sort

Last edited: October 10, 2025

For directed acyclic graphs, a topological sort of a directed graph is such that if there’s an edge \(A \to B\), then \(A\) comes before \(B\) in the sort (i.e. there’s not an edge from \(B\) to \(A\)). Under direct acyclic graphs, a topological sort always exist.

solving topological sort with depth first search

In a DAG, you can always go from larger finish times to smaller finish times in depth first search to be able to get a topological sort.

Decision Tree

Last edited: October 10, 2025

Let’s consider greedy Decision Tree learning.

greedy procedure

  1. initial tree—no split: always predict the majority class \(\hat{y} = \text{maj}\qty(y), \forall x\)
  2. for each feature \(h\qty(x)\)
    1. split data according to feature
    2. compute classification error of the split
  3. choose \(h^{*}\qty(x)\) with the lowest error after splitting
  4. loop until stop

stopping criteria

  1. each node agrees on \(y\) (the tree fits data exactly)
  2. exhausted on all features (nothing to split on)

additional information

threshold splitting

We are going to perform what’s called a “threshold split.” Choose thresholds between two points as the “split values” to check. Now, how do we deal with splitting twice? We can until we get bored or we over fit.