SU-CS161 SEP232025
Last edited: September 9, 2025Divide and Conquer
Break problem into smaller sub-problems.
example: multiplication
Multiplying by powers of ten is easy, so we can break a multiplication into smaller groups.
For instance, we can break \(n\) digit integer into:
\begin{equation} [x_1, x_2, \dots, x_{\frac{n}{2}}] \times 10^{\frac{n}{2}} + [x_{\frac{n}{2}+1}, x_{\frac{n}{2}+2}, \dots] \end{equation}
Then we can multiply two large values by writing:
\begin{align} x \times y &= \qty(a \times 10^{\frac{n}{2}} + b ) \qty(c \times 10^{\frac{n}{2}} + d) \\ &= \qty(a \times c ) 10^{n} + \qty(a \times d + c \times b) 10^{\frac{n}{2}} + \qty( b \times d) \end{align}
SU-CS161 SEP252025
Last edited: September 9, 2025Key Sequence
Notation
New Concepts
Important Results / Claims
quantifying success
- single example
- did it on IID examples
- worst-case analysis
SU-CS229 SEP222025
Last edited: September 9, 2025Logistics


Driving Forces Behind AI
Three key players
Computation
- cloud compute
- GPUs
Data
Web data is a powerful source to get more general intelligence.
Algorithms
Old school algorithms, but then with enough data scales.
Key Trends in AI
- moving away from symbolics: don’t code, learn
- moving away from small networks: deep learning
- LLMs popularizing machine learning
- more ethical AI
Scale Emergence
Capabilities in LMs emerge after certain scale: i.e. there’s a sudden improvement in performance after a while.
SU-CS229 SEP242025
Last edited: September 9, 2025Supervise learning!
Some Notational Conventions
- \(n\): number of training examples
- \(m\): number of features
- \(x\): input feature(s)
- \(y\): output*/*target feature
- \(\theta\): parameters
- \(h_{\theta}\qty(x)\): the predictor function
And so, a tuple \(\qty(x,y)\) is a particular training example. We will use the parentheses notation to denote samples, so \(\qty(x^{(i)}, y^{(i)})\) as the ith example of training. We typically use \(h\qty(x)\) as the predictor, parameters are \(\theta_{j}\).
New Concepts
- Linear Regression
- least-squares error
- gradient descent
- gradient descent for least-squares error
- variants
- summing over dataset: batch gradient descent
- pick one sample and run it: stochastic gradient descent
- pick some samples and run them: mini-batch gradient descenmini-bach gradient descet
- a primer on Vector Calculus
- Normal Equation
SU-CS229 SEP292025
Last edited: September 9, 2025Key Sequence
Review even more! Linear Regression, give some intuition, discuss logistic regression and give an optimization method for it.
Notation
Recall the notation:
- \(\qty(x^{(i)}, y^{(i)})\), ith example
- \(x^{(i)} \in \mathbb{R}^{m+1}\), where \(x_0^{(i)}, \forall i = 1\)
- \(y^{(i)} \in \mathbb{R}\)
\(n\) — number of examples; \(m\) — number of features
