_index.org

SU-CS161 Things to Review

Last edited: October 10, 2025
  • log laws, exponent laws and general non-discrete math stuff

  • distributions and infinite series, math53 content

  • combinations

    • \(\mqty(n \\ k) = \mqty(n-1 \\ k-1) + \mqty(n-1 \\ k)\)
    • \(\mqty(n \\k) = \mqty(n \\ n-k)\)
  • binomial theorem: \(\qty(a+b)^{n} = \sum_{k=0}^{n} \mqty(n \\k)a^{k} b^{n-k}\)

  • geometric sum

SU-CS229 Distribution Sheet

Last edited: October 10, 2025

Here’s a bunch of exponential family distributions. Recall:

\begin{equation} p\qty(x;\eta) = b\qty(x) \exp \qty(\eta^{T}T\qty(x) - a\qty(\eta)) \end{equation}

normal, berunouli, posisson, binomial, negative binomial, geometric, chi-squared, exponential are all in

normal distribution

\(\mu\) the mean, \(\sigma\) the variance

\begin{equation} p\qty(x;\mu, \Sigma) = \frac{1}{\qty(2\pi)^{\frac{|x|}{2}} \text{det}\qty(\Sigma)^{\frac{1}{2}}} \exp \qty(-\frac{1}{2} \qty(x-\mu)^{T}\Sigma^{-1}\qty(x-\mu)) \end{equation}

\begin{equation} p\qty(x; \mu, \sigma) = \frac{1}{\sqrt{2\pi\sigma^{2}}} \exp \qty({ \frac{-(x-u)^{2}}{2 \sigma^{2}}}) \end{equation}

\begin{equation} \mathbb{E}[x] = \mu \end{equation}

\begin{equation} \text{Var}\qty [x] = \sigma^{2} \end{equation}

This is exponential family distribution. For \(\sigma^{2} = 1\):

SU-CS229 OCT292025

Last edited: October 10, 2025

xavier initialization

Last edited: October 10, 2025

An neural network initialization scheme that tries to avoid Vanishing Gradients.

Consider \(Wx\) step in a neural network:

\begin{equation} o_{i} = \sum_{j=1}^{n_{\text{in}}} w_{ij} x_{j} \end{equation}

The variance of this:

\begin{equation} \text{Var}\qty [o_{i}] = n_{\text{in}} \sigma^{2} v^{2} \end{equation}

Kosaraju's Algorithm

Last edited: October 10, 2025

Finding strongly connected components.

  1. Do DFS starting at random node to create a DFS forest (i.e. disconnected DFS trees), keeping track of finishing times
  2. Reverse all edges in the graph
  3. Do DFS again to create another DFS forest, start DFS procedure with the vertices with the largest finishing time in step 1
  4. strongly connected components are the trees that changed from the first in the second DFS graph

main idea

Recall that finishing time helps us track topological sort. After reversing edges, the condensation graph node with the largest finish time has no edges going out (i.e. it was originally the first in the topological sort); if we run DFS there, we will find exactly that component. After DFSing there, we will have “removed” that component (having visited it), and repeat.