Posts

Misc. Financial Market Questions

Last edited: August 8, 2025

Why so many stock exchanges?

Because the FTC just allows you to make’em as desired.

Why doesn’t the market trade 24 hours a day?

Because the institutional traders can only trade 2 hours a day: the beginning of the day, or the end of the day. Otherwise, there are not enough volume for the institutional traders to be able to trade at their size. See Volume Profile.

What’s a good “full view” of the stock?

  • The order book! You can actually see it by paying money to the exchange.
  • You want to subscribe to every order for every exchange.

How to large traders strategically break stocks?

“How long should I take?”

ML COVID Drug Discovery

Last edited: August 8, 2025

Focus on protease: inhibition helps inhibit viral replication; and it is conserved across most coronaviruses; so good point to start working in drug development.

  • Take smaller binding fragments covering the binding site, and combine them together
  • Try to combine these fragments together into a molecule that fits well with the binding site

protease inhibition is usually achieved with a covalent peptide bond, but this crowd-sourcing effort showed that

machine-learning rapid library synthesis

  1. begin with some guess for the model molecule
  2. then, use ML to perform modifications to the molecule really quickly by scanning though (“ML-prioritized rapid library synthesis”) a bunch of changes to the molecule
  3. pick and repeat

Molecular Transformer

THROW THE FUCKING REACTION INTO AN LLM, as WORDS

MLE for Conditional Gaussian

Last edited: August 8, 2025

Let’s say we want to find MLE parameters \(\theta\) for a conditional Gaussian with constant variance. That is:

\begin{equation} p\qty(y_{i} | x_{i}) = \mathcal{N} \qty(y_{i}|f_{\theta } \qty(x_{i}), \sigma^{2}) \end{equation}

and we have a corresponding dataset: \(\qty(x_1, y_1), …, \qty(x_{m}, y_{m})\).

where:

\begin{align} \hat{\theta} &= \arg\max_{\theta} \sum_{i=1}^{m} \log p\qty(y_{i}|x_{i}) \\ &= \arg\max_{\theta} \sum_{i=1}^{m} \log \mathcal{N} \qty(y_{i}| f_{\theta} \qty(x_{i}), \sigma^{2}) \\ &= \arg\max_{\theta } \sum_{i=1}^{m} \log \frac{1}{\sqrt{{2 \pi \sigma^{2}}}} \exp \qty(- \frac{\qty(y_{i}- f_{\theta }\qty(x_{i}))^{2}}{2\sigma^{2}}) \end{align}

MLib

Last edited: August 8, 2025

MLib is a machine learning library built on top of Spark.

from pyspalk.mllib.clustering import KMeans

KMeans(rdd)

where you pass the MLib a PySpark RDD