old Transformers

Last edited: August 8, 2025

Transformers has replaced large pipelines into a single system.

“Transformers verticalized tasks in 2013 EMNLP; various domains”

Process

Multiple manual systems that talk to each other has been replaced by neurons talking to each other
General word embeddings like Word2Vec
Sequence to sequence modeling from those vecs that are more general: learning variable length representations
From LSTMs to Encoder-Decoder architectures: Google Neural Machine Translation System 2016 (LSTM seq2seq SoTA)

So: big complicated pipelines turn into one homogeneous system.

On the Clock

Last edited: August 8, 2025

Everyday, at 11:00 PM exactly, I stop time tracking.

And it feels somehow as the most liberating time of my day. When I truly feels like I have my time back to myself,

One-Shot Deformation

Last edited: August 8, 2025

We have an expression:

\begin{equation} B = \frac{FL^{3}}{3EI} = \frac{N m^{3}}{3 p m^{4}} = \frac{Nm^{3}}{\frac{N}{m^{2}}m^{4}} = m \end{equation}

With constants:

\(B\): \(m\), deflection at the point of force application
\(F\): \(N\), force applied
\(L\): \(m\), distance between fixed point and point of force application
\(E\): \(p=\frac{N}{m^{2}}\), elastic modulus
\(I\): \(m^{4}\), second moment of area

As per measured:

\(B\): \(9.15 \cdot 10^{-4} m\)
\(F\): \(20N\)
\(L\): \(9.373 \cdot 10^{-2} m\)
\(I\): \(1.37 \cdot 10^{-10} m^{4}\) = \(\frac{WH^{3}}{12}\) = \(\frac{(6.25 \cdot 10^{-3})(6.4 \cdot 10^{-3})^{3}}{12}\)

Theoretical:

online m

Last edited: August 8, 2025

online planning

Last edited: August 8, 2025

For elements with large possible future state space, we can’t just iterate over all states to get a value function for every state, and THEN go about using the greedy policy to perform actions.

Therefore, we employ a technique called receding horizon planning: planning from the current state upwards to a maximum horizon \(d\), figure out what the best SINGLE action would be given that information for only this state, and then replan.