logistic regression
Last edited: October 10, 2025Using Linear Regression to perform a classification task \(y \in \qty {0,1}\) sounds kind of silly. We assume that \(y\) follows a kind of Bernoulli distribution.
requirements
\begin{align} h_{\theta}\qty(x) &= g\qty(\theta^{T} x) \\ &= \frac{1}{1+e^{-\theta^{T}x}} \end{align}
That is, we apply a sigmoid function to our linear regression output to perform classification. Such a sigmoid function has the following properties:
\begin{equation} p\qty(y=1|x;\theta) = h_{\theta}\qty(x) \end{equation}
\begin{equation} p\qty(y=0|x;\theta) = 1-h_{\theta}\qty(x) \end{equation}
\begin{equation} y \in \qty {0,1} \end{equation}
regularization
Last edited: October 10, 2025regularization penalize large weights to reduce overfitting
- create data interpolation that countains intentional error (by throwing away/hiding parameters), missing some/all of the data points
- this makes the resulting function more “predictable”/“smooth”
there is, therefore, a trade-off between sacrificing quality and on the ORIGINAL data and better accuracy on new points. If you regularize too much, you will underfit.
Motivation
Recall that, for linear regression, we want to optimize:
\begin{equation} \min_{\theta} \frac{1}{2} \sum_{i=1}^{n} \norm{ y^{(i)} - \theta^{\top}x^{(i)} }^{2} \end{equation}
depth first search
Last edited: October 10, 2025Each vertex keeps track of three states: visited, in progress, done. We also keep track of when we going to enter it and when we are also be done
# initialize vertex.state = UNVISITED across graph
def dfs(vertex, currentTime):
vertex.startTime = currentTime
currentTime++
vertex.state = IN_PROGRESS
for v in vertex.neightbors:
if v.state == UNVISITED: # to prevent loops
currentTime = dfs(v, currentTime)
currentTime += 1
w.finishTime = currentTime
w.state == DONE
return currentTime
This explores all connected component starting from each vertex, so presumably you have to repeatedly by iterating through all verticies.
graph
Last edited: October 10, 2025undirected graph
constituents
- vertices’s \(V\), a set of values of node
- edges \(E\), a set of unordered sets of verticies
requirements
We write:
\begin{equation} G = \qty(V, E) \end{equation}
We say:
- degree of vertex is the number of edges coming out
- connected verticies are neighbors
additional information
directed graph
constituents
- vertices’s \(V\), a set of values of node
- edges \(E\), a set of directed edges
requirements
We write:
\begin{equation} G = \qty(V, E) \end{equation}
Markov Chain Monte-Carlo
Last edited: October 10, 2025Start with an initial sample \(\tau\)
At each distribution…
- sample \(\tau ’ \sim g\qty(\cdot | \tau)\) (for instance, \(\tau’ \sim \mathcal{N}\qty(\cdot | \tau, \sigma^{2})\))
- accept the sample with probability given by \(\frac{\bar{p} \qty(\tau’) g\qty(\tau | \tau’)}{\bar{p}\qty(\tau) g\qty(\tau’ | \tau)}\), otherwise keep \(\tau\) (this is also called the Metropolis-Hastings criteria)
intuition
The kernel is often chosen to be symmetric, so:
\begin{equation} g\qty(\tau | \tau’) = g\qty(\tau’ | \tau) \end{equation}
we want to sample from areas of high density more often. Consider:
