Houjun Liu

Random Walk Hypothesis

The Random Walk Hypothesis is a financial econometric hypothesis that stocks have the same distribution and independent of each other: that stocks are a random variable and not predictable in a macro space.

To set up the random walk hypothesis, let’s begin with some time \(t\), an asset return \(r_t\), some time elapsed \(k\), and some future asset return \(r_{t+k}\).

We will create two random variables \(f(r_t)\) and \(g(r_{t+k})\), which \(f\) and \(g\) are arbitrary functions we applied to analyze the return at that time.

The Random Walk Hypothesis tells us that, at any two unrelated given time, you cannot use the behavior of \(r_t\) to predict anything about \(r_{t+k}\), under any kind of analysis \(f\) or \(g\), that:

\begin{equation} Cov[f(r_t), g(r_{t+k})] = 0 \end{equation}

So, all of the Random Walk Hypothesis models would leverage the above result, that the two time info don’t evolve together and they are independently, randomly distributed: they are random variables.

For the market to be a typical Random Walk, the central limit theorem has to hold on the value of return. This usually possible, but if the variance of the return is not finite, the return will not hold the central limit theorem which means that the return will not be normal. Of course the return does not have to hold central limit theorem, then we use other convergence distributions but still model it in the Random Walk Hypothesis as a random variable.

return (FinMetrics)

Importantly: its not the price that follows the random walk; it is the RETURN that follows the walk; if it was the price, then its possible for price to become negative. Return, technically, is defined by:

\begin{equation} R_t = \frac{p_t-p_{t-1}}{p_{t-1}} \end{equation}

However, we really are interested in the natural log of the prices:

\begin{equation} r_t = log(p_t) - log(p_{t-1}) \approx R_t \end{equation}

We can do this is because, for small \(x\), \(log\ x \approx x-1\).

We do this is because, if we were wanting to add the returns over the last \(n\) days, in \(R_t\) you’d have to multiply them:

\begin{equation} \frac{p_{t+1}}{p_t} \cdot \frac{p_t}{p_{t-1}} = \frac{p_{t+1}}{p_{t-1}} \end{equation}

This is bad, because of the central limit theorem. To make a random variable built of normalizing \(n\) items, you have to add and not multiply them together over a time range. We want to be able to add.

Therefore, \(r_t\) can achieve the same division by adding (see the log laws).

But either way, with enough, we know that \(r_t\) is independently, identity distributed.

time series analysis

Over some days \(k\), we have:

\begin{equation} Y_{k} = \sum_{i=1}^{k} x_{i} \end{equation}

Given that \(x_{i}\) is distributed randomly: \(\{x_{i}\}_{i=1}^{N}\). This becomes the foundation of time series analysis. The problem of course becomes harder when the values drift against each other, is nonindependent, etc. We can use the Martingale Model to take generic random walk to a more dependent model.

CJ test

If you have some amount of volacitity measurement, we first know that, by the Random Walk Hypothesis, we have:

\begin{equation} X_{k} \sim N(0,\sigma^{2}) \end{equation}

Given some future return, you hope that:

\begin{equation} Y_{k}=\sum_{i=1}^{k}X_{k}\sim N(0,\sigma^{2}) \end{equation}

If so, if you have like \(20\%\) of log returns, to have a statistically significant return, we have that:

\begin{equation} \sigma =\frac{0.2}{\sqrt{12}} \end{equation}

getting a statistically significant difference from it is hard.