Let’s begin. We want to create test for the linearity of a few assets, for whether or not they follow the CAPM.
Note that we will be using the Sharpe-Linter version of CAPM:
\begin{equation} E[R_{i}-R_{f}] = \beta_{im} E[(R_{m}-R_{f})] \end{equation}
\begin{equation} \beta_{im} := \frac{Cov[(R_{i}-R_{f}),(R_{m}-R_{f})]}{Var[R_{m}-R_{f}]} \end{equation}
Recall that we declare \(R_{f}\) (the risk-free rate) to be non-stochastic.
Let us begin. We will create a generic function to analyze some given stock.
We will first import our utilities
import pandas as pd
import numpy as np
Let’s load the data from our market (NYSE) as well as our 10 year t-bill data.
t_bill = pd.read_csv("./linearity_test_data/10yrTBill.csv")
nyse = pd.read_csv("./linearity_test_data/NYSEComposite.csv")
nyse.head()
Date Close
0 11/7/2013 16:00:00 9924.37
1 11/8/2013 16:00:00 10032.14
2 11/11/2013 16:00:00 10042.95
3 11/12/2013 16:00:00 10009.84
4 11/13/2013 16:00:00 10079.89
Excellent. Let’s load in the data for that stock.
def load_stock(stock):
return pd.read_csv(f"./linearity_test_data/{stock}.csv")
load_stock("LMT").head()
Date Close
0 11/7/2013 16:00:00 136.20
1 11/8/2013 16:00:00 138.11
2 11/11/2013 16:00:00 137.15
3 11/12/2013 16:00:00 137.23
4 11/13/2013 16:00:00 137.26
And now, let’s load all three stocks, then concatenate them all into a big-ol DataFrame.
# load data
df = { "Date": nyse.Date,
"NYSE": nyse.Close,
"TBill": t_bill.Close,
"LMT": load_stock("LMT").Close,
"TWTR": load_stock("TWTR").Close,
"MCD": load_stock("MCD").Close }
# convert to dataframe
df = pd.DataFrame(df)
# drop empty
df.dropna(inplace=True)
df
Date NYSE TBill LMT TWTR MCD
0 11/7/2013 16:00:00 9924.37 26.13 136.20 44.90 97.20
1 11/8/2013 16:00:00 10032.14 27.46 138.11 41.65 97.01
2 11/11/2013 16:00:00 10042.95 27.51 137.15 42.90 97.09
3 11/12/2013 16:00:00 10009.84 27.68 137.23 41.90 97.66
4 11/13/2013 16:00:00 10079.89 27.25 137.26 42.60 98.11
... ... ... ... ... ... ...
2154 10/24/2022 16:00:00 14226.11 30.44 440.11 39.60 252.21
2155 10/25/2022 16:00:00 14440.69 31.56 439.30 39.30 249.28
2156 10/26/2022 16:00:00 14531.69 33.66 441.11 39.91 250.38
2157 10/27/2022 16:00:00 14569.90 34.83 442.69 40.16 248.36
2158 10/28/2022 16:00:00 14795.63 33.95 443.41 39.56 248.07
[2159 rows x 6 columns]
Excellent. Now, let’s convert all of these values into daily log-returns (we don’t really care about the actual pricing.)
log_returns = df[["NYSE", "TBill", "LMT", "TWTR", "MCD"]].apply(np.log, inplace=True)
df.loc[:, ["NYSE", "TBill", "LMT", "TWTR", "MCD"]] = log_returns
df
Date NYSE TBill LMT TWTR MCD
0 11/7/2013 16:00:00 9.202749 3.263084 4.914124 3.804438 4.576771
1 11/8/2013 16:00:00 9.213549 3.312730 4.928050 3.729301 4.574814
2 11/11/2013 16:00:00 9.214626 3.314550 4.921075 3.758872 4.575638
3 11/12/2013 16:00:00 9.211324 3.320710 4.921658 3.735286 4.581492
4 11/13/2013 16:00:00 9.218298 3.305054 4.921877 3.751854 4.586089
... ... ... ... ... ... ...
2154 10/24/2022 16:00:00 9.562834 3.415758 6.087025 3.678829 5.530262
2155 10/25/2022 16:00:00 9.577805 3.451890 6.085183 3.671225 5.518577
2156 10/26/2022 16:00:00 9.584087 3.516310 6.089294 3.686627 5.522980
2157 10/27/2022 16:00:00 9.586713 3.550479 6.092870 3.692871 5.514879
2158 10/28/2022 16:00:00 9.602087 3.524889 6.094495 3.677819 5.513711
[2159 rows x 6 columns]
We will now calculate the log daily returns. But before—the dates are no longer relavent here so we drop them.
df
Date NYSE TBill LMT TWTR MCD
0 11/7/2013 16:00:00 9.202749 3.263084 4.914124 3.804438 4.576771
1 11/8/2013 16:00:00 9.213549 3.312730 4.928050 3.729301 4.574814
2 11/11/2013 16:00:00 9.214626 3.314550 4.921075 3.758872 4.575638
3 11/12/2013 16:00:00 9.211324 3.320710 4.921658 3.735286 4.581492
4 11/13/2013 16:00:00 9.218298 3.305054 4.921877 3.751854 4.586089
... ... ... ... ... ... ...
2154 10/24/2022 16:00:00 9.562834 3.415758 6.087025 3.678829 5.530262
2155 10/25/2022 16:00:00 9.577805 3.451890 6.085183 3.671225 5.518577
2156 10/26/2022 16:00:00 9.584087 3.516310 6.089294 3.686627 5.522980
2157 10/27/2022 16:00:00 9.586713 3.550479 6.092870 3.692871 5.514879
2158 10/28/2022 16:00:00 9.602087 3.524889 6.094495 3.677819 5.513711
[2159 rows x 6 columns]
And now, the log returns! We will shift this data by one column and subtract.
returns = df.drop(columns=["Date"]) - df.drop(columns=["Date"]).shift(1)
returns.dropna(inplace=True)
returns
NYSE TBill LMT TWTR MCD
1 0.010801 0.049646 0.013926 -0.075136 -0.001957
2 0.001077 0.001819 -0.006975 0.029570 0.000824
3 -0.003302 0.006161 0.000583 -0.023586 0.005854
4 0.006974 -0.015657 0.000219 0.016568 0.004597
5 0.005010 -0.008476 0.007476 0.047896 -0.005622
... ... ... ... ... ...
2154 0.005785 0.004940 -0.023467 -0.014291 0.001349
2155 0.014971 0.036133 -0.001842 -0.007605 -0.011685
2156 0.006282 0.064420 0.004112 0.015402 0.004403
2157 0.002626 0.034169 0.003575 0.006245 -0.008100
2158 0.015374 -0.025590 0.001625 -0.015053 -0.001168
[2158 rows x 5 columns]
We are now ready to run the correlation study.
Let’s now subtract everything by the risk-free rate (dropping the rfr itself):
risk_free_excess = returns.drop(columns="TBill").apply(lambda x: x-returns.TBill)
risk_free_excess
NYSE LMT TWTR MCD
1 -0.038846 -0.035720 -0.124783 -0.051603
2 -0.000742 -0.008794 0.027751 -0.000995
3 -0.009463 -0.005577 -0.029747 -0.000307
4 0.022630 0.015875 0.032225 0.020254
5 0.013486 0.015952 0.056372 0.002854
... ... ... ... ...
2154 0.000845 -0.028406 -0.019231 -0.003591
2155 -0.021162 -0.037975 -0.043738 -0.047818
2156 -0.058138 -0.060308 -0.049017 -0.060017
2157 -0.031543 -0.030593 -0.027924 -0.042269
2158 0.040964 0.027215 0.010537 0.024422
[2158 rows x 4 columns]
Actual Regression
It is now time to perform the actual linear regression!
import statsmodels.api as sm
Let’s work with Lockheed Martin first, fitting an ordinary least squares. Remember that the OLS functions reads the endogenous variable first (for us, the return of the asset.)
# add a column of ones to our input market excess returns
nyse_with_bias = sm.add_constant(risk_free_excess.NYSE)
# perform linreg
lmt_model = sm.OLS(risk_free_excess.LMT, nyse_with_bias).fit()
lmt_model.summary()
OLS Regression Results
==============================================================================
Dep. Variable: LMT R-squared: 0.859
Model: OLS Adj. R-squared: 0.859
Method: Least Squares F-statistic: 1.312e+04
Date: Mon, 31 Oct 2022 Prob (F-statistic): 0.00
Time: 10:39:24 Log-Likelihood: 6318.9
No. Observations: 2158 AIC: -1.263e+04
Df Residuals: 2156 BIC: -1.262e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0004 0.000 1.311 0.190 -0.000 0.001
NYSE 0.9449 0.008 114.552 0.000 0.929 0.961
==============================================================================
Omnibus: 423.969 Durbin-Watson: 1.965
Prob(Omnibus): 0.000 Jarque-Bera (JB): 11575.074
Skew: -0.160 Prob(JB): 0.00
Kurtosis: 14.341 Cond. No. 29.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Based on the constants row, we can see that—within \(95\%\) confidence—the intercept is generally \(0\) and CAPM applies. However, we do see a slight positive compared to the market. Furthermore, we can see that the regression has a beta value of \(0.9449\) — according the CAPM model, it being slightly undervarying that the market.
We can continue with the other stocks.
# perform linreg
mcd_model = sm.OLS(risk_free_excess.MCD, nyse_with_bias).fit()
mcd_model.summary()
OLS Regression Results
==============================================================================
Dep. Variable: MCD R-squared: 0.887
Model: OLS Adj. R-squared: 0.887
Method: Least Squares F-statistic: 1.697e+04
Date: Mon, 31 Oct 2022 Prob (F-statistic): 0.00
Time: 10:39:24 Log-Likelihood: 6551.1
No. Observations: 2158 AIC: -1.310e+04
Df Residuals: 2156 BIC: -1.309e+04
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0003 0.000 1.004 0.315 -0.000 0.001
NYSE 0.9651 0.007 130.287 0.000 0.951 0.980
==============================================================================
Omnibus: 323.911 Durbin-Watson: 1.988
Prob(Omnibus): 0.000 Jarque-Bera (JB): 3032.550
Skew: 0.395 Prob(JB): 0.00
Kurtosis: 8.753 Cond. No. 29.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Same thing as before, we are within \(95\%\) confidence having a intercept of \(0\) (with a slight positive edge), and it looks like MacDonald’s vary a little bit more than Lockheed Martin. The food industry is probably a tougher business than that in defense.
Lastly, to analyze the recently delisted Twitter!
# perform linreg
twtr_model = sm.OLS(risk_free_excess.TWTR, nyse_with_bias).fit()
twtr_model.summary()
OLS Regression Results
==============================================================================
Dep. Variable: TWTR R-squared: 0.522
Model: OLS Adj. R-squared: 0.522
Method: Least Squares F-statistic: 2357.
Date: Mon, 31 Oct 2022 Prob (F-statistic): 0.00
Time: 10:39:24 Log-Likelihood: 4307.1
No. Observations: 2158 AIC: -8610.
Df Residuals: 2156 BIC: -8599.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -0.0002 0.001 -0.346 0.730 -0.002 0.001
NYSE 1.0173 0.021 48.549 0.000 0.976 1.058
==============================================================================
Omnibus: 661.205 Durbin-Watson: 1.986
Prob(Omnibus): 0.000 Jarque-Bera (JB): 15925.609
Skew: -0.883 Prob(JB): 0.00
Kurtosis: 16.191 Cond. No. 29.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Evidently, Twitter is much more variable. It looks like it has a nontrivial bias (the intercept being -0.001 being within the \(95\%\) confidence band — that the security is possibly significantly underperforming the CAPM expectation in the market.) Furthermore, we have a positive beta value: that the asset is more variable than the market.
manual regression
We can also use the betas formula to manually calculate what we expect for the beta values (i.e. as if they were one IID random variable.)
risk_free_cov = risk_free_excess.cov()
risk_free_cov
NYSE LMT TWTR MCD
NYSE 0.001143 0.001080 0.001163 0.001103
LMT 0.001080 0.001188 0.001116 0.001083
TWTR 0.001163 0.001116 0.002264 0.001155
MCD 0.001103 0.001083 0.001155 0.001200
Finally, to construct the beta values. Recall that:
\begin{equation} \beta_{im} := \frac{Cov[(R_{i}-R_{f}),(R_{m}-R_{f})]}{Var[R_{m}-R_{f}]} \end{equation}
and that:
\begin{equation} Var[X] = Cov[X,X], \forall X \end{equation}
# get the market variance (covariance with itself)
market_variation = risk_free_cov.NYSE.NYSE
# calculate betas
betas = {"LMT": (risk_free_cov.LMT.NYSE/market_variation),
"TWTR": (risk_free_cov.TWTR.NYSE/market_variation),
"MCD": (risk_free_cov.MCD.NYSE/market_variation)}
# and make dataframe
betas = pd.Series(betas)
betas
LMT 0.944899
TWTR 1.017294
MCD 0.965081
dtype: float64
Apparently, all of our assets swing less than the overall NYSE market! Especially Lockheed—it is only \(94.4\%\) of the market variation. Furthermore, it is interesting to see that Twitter swings much more dramatically compared to the market.
Fund creation
We will now create two funds with the three securities, one with equal parts and one which attempts to maximizes the bias (max returns) while minimizing the beta variance value compared to the market.
“Equal-Parts Fund” (“Fund 1”)
We will now create a fund in equal parts. Here it is:
fund_1_returns = returns.LMT + returns.TWTR + returns.MCD
fund_1_returns
1 -0.063167
2 0.023420
3 -0.017149
4 0.021384
5 0.049750
...
2154 -0.036409
2155 -0.021132
2156 0.023917
2157 0.001720
2158 -0.014596
Length: 2158, dtype: float64
We will calculate the excess returns of this fund:
fund_1_excess = fund_1_returns-returns.TBill
fund_1_excess
1 -0.112813
2 0.021600
3 -0.023310
4 0.037041
5 0.058226
...
2154 -0.041349
2155 -0.057265
2156 -0.040503
2157 -0.032449
2158 0.010994
Length: 2158, dtype: float64
And then perform a regression
# perform linreg
fund_1_model = sm.OLS(fund_1_excess, nyse_with_bias).fit()
fund_1_model.summary()
OLS Regression Results
==============================================================================
Dep. Variable: y R-squared: 0.473
Model: OLS Adj. R-squared: 0.473
Method: Least Squares F-statistic: 1935.
Date: Mon, 31 Oct 2022 Prob (F-statistic): 3.01e-302
Time: 10:39:24 Log-Likelihood: 3869.5
No. Observations: 2158 AIC: -7735.
Df Residuals: 2156 BIC: -7724.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 0.0007 0.001 0.841 0.401 -0.001 0.002
NYSE 1.1290 0.026 43.993 0.000 1.079 1.179
==============================================================================
Omnibus: 600.456 Durbin-Watson: 2.022
Prob(Omnibus): 0.000 Jarque-Bera (JB): 8416.514
Skew: -0.914 Prob(JB): 0.00
Kurtosis: 12.501 Cond. No. 29.6
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
Surprisingly, we have now created a significantly riskier investment that, though riskier, generates a much higher probability of reward (\(+0.001\) is now within the \(99\%\) band!)
A Better Fund
To me, this is the payoff of this assignment. We will now use CAPM to create the “best” fund combination—given some variance, the funds which match CAPM. To do this, let’s create a generic linear combination of the assets.
import sympy as sym
x = sym.Symbol('x')
y = sym.Symbol('y')
z = sym.Symbol('z')
fund_2_returns = x*returns.LMT + y*returns.TWTR + z*returns.MCD
fund_2_returns
1 0.0139260753744255*x - 0.0751364261353569*y - ...
2 -0.00697525170622448*x + 0.0295704573211193*y ...
3 0.000583132897928884*x - 0.0235859990058791*y ...
4 0.000218587198947517*x + 0.016568426347233*y +...
5 0.00747599199607762*x + 0.0478955096700351*y -...
...
2154 -0.0234665578621085*x - 0.0142913301107561*y +...
2155 -0.00184214468578059*x - 0.0076045993852194*y ...
2156 0.00411172646842317*x + 0.0154024001854269*y +...
2157 0.00357547337231878*x + 0.0062445563228315*y -...
2158 0.00162509910496933*x - 0.0150529686289622*y -...
Length: 2158, dtype: object
Excellent. We will also calculate the excess returns of this fund:
fund_2_excess = fund_2_returns-returns.TBill
Y = fund_2_excess.to_numpy()
Y
[0.0139260753744255*x - 0.0751364261353569*y - 0.00195664549320096*z - 0.0496463208073039
-0.00697525170622448*x + 0.0295704573211193*y + 0.000824317408861575*z - 0.00181917459665826
0.000583132897928884*x - 0.0235859990058791*y + 0.00585367525146019*z - 0.00616055581298536
...
0.00411172646842317*x + 0.0154024001854269*y + 0.00440300114913317*z - 0.0644196927849867
0.00357547337231878*x + 0.0062445563228315*y - 0.0081004573348249*z - 0.0341688956152497
0.00162509910496933*x - 0.0150529686289622*y - 0.00116834209450634*z + 0.0255902303732043]
We cast this type to a numpy array because we are about to perform some matrix operations upon it.
Now, let us perform the actual linear regression ourselves. Recall that the pseudoinverse linear regression estimator is:
\begin{equation} \beta = (X^{T}X)^{-1}X^{T}Y \end{equation}
We have already our \(Y\) as a vector (above), and our \(X\) is:
X = nyse_with_bias.to_numpy()
X
[[ 1.00000000e+00 -3.88457302e-02]
[ 1.00000000e+00 -7.42217926e-04]
[ 1.00000000e+00 -9.46284244e-03]
...
[ 1.00000000e+00 -5.81378271e-02]
[ 1.00000000e+00 -3.15429207e-02]
[ 1.00000000e+00 4.09643405e-02]]
We now have our matrices, let’s perform the linear regression!
linear_model = np.linalg.inv((X.transpose()@X))@X.transpose()@Y
linear_model
[0.000544056413840724*x - 6.62061061591867e-5*y + 0.000429966553373172*z - 0.000178620725465344
0.0457830563134785*x + 0.118178191274045*y + 0.0659651260604729*z + 0.899115719100281]
Excellent. So we now have two rows; the top row represents the “bias”—how much deviation there is from CAPM, and the bottom row represents the “rate”—the “beta” value which represents how much excess variance there is.
We can will solve for a combination of solutions to give us specific values of returns vs risk. We will set the asset to learn exactly as much as the market (i.e. no bias).
deviance_expr = linear_model[0]
deviance_expr
0.000544056413840724*x - 6.62061061591867e-5*y + 0.000429966553373172*z - 0.000178620725465344
We will now try to make variance exactly as much as that in the market.
risk_expr = linear_model[1] - 1
risk_expr
0.0457830563134785*x + 0.118178191274045*y + 0.0659651260604729*z - 0.100884280899719
Let us now calculate the boundary condition of our optimization problem by solving an expression in these two expressions.
solution = sym.solvers.solve([deviance_expr, risk_expr], x,y,z)
solution
{x: 0.412737013327711 - 0.819584899551304*z, y: 0.693765220909132 - 0.24067066980814*z}
Excellent. Let us recalculate our optimization objective (“deviance”—return) in terms of these new solutions. We aim now to maximize this expression by minimizing (i.e. our optimizer minimizes) the negative thereof—recalling that scypy works as a minimizer.
optim_objective = deviance_expr.subs(solution)-1e2
optim_objective
-5.04831636563563e-19*z - 100.0
We can now use this value to solve for a \(z\) value.
optim_solution = sym.solvers.solve([optim_objective], z)
optim_solution
{z: -1.98085842402250e+20}
Excellent. We can now solve for the rest of our values.
z0 = float(optim_solution[z])
x0 = solution[x].subs(z, z0)
y0 = solution[y].subs(z, z0)
(x0,y0,z0)
(1.62348165247784e+16, 4.76734523704593e+15, -1.980858424022502e+16)
This would create the following plan:
# solution
fund_2_nobias = x0*returns.LMT + y0*returns.TWTR + z0*returns.MCD
fund_2_nobias.mean()
0.009168283711770158
Recall that this is the performance of the balanced portfolio:
fund_1_returns.mean()
0.0009224705380695683
Finally, let’s plot the prices of our various funds:
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import seaborn as sns
from datetime import datetime
sns.set()
fund_2_price = x0*df.LMT + y0*df.TWTR + z0*df.MCD
fund_1_price = df.LMT + df.TWTR
fund_l_price = df.LMT
fund_t_price = df.TWTR
dates = df.Date.apply(lambda x:datetime.strptime(x, "%m/%d/%Y %H:%M:%S"))
sns.lineplot(x=dates, y=fund_2_price.apply(sym.Float).astype(float))
sns.lineplot(x=dates, y=fund_1_price.apply(sym.Float).astype(float))
sns.lineplot(x=dates, y=fund_l_price.apply(sym.Float).astype(float))
sns.lineplot(x=dates, y=fund_t_price.apply(sym.Float).astype(float))
plt.gca().xaxis.set_major_locator(mdates.YearLocator())
plt.gca().xaxis.set_major_formatter(mdates.DateFormatter('%Y'))
plt.gca().set_ylabel("Price")
plt.show()
None
![](/ox-hugo/2022-10-29_23-33-46_screenshot.png)
Recall that we didn’t actually buy any MacDonald’s. So, we have—blue, being our “optimal” portfolio, yellow, our balanced portfolio, green, being Lockheed, and red, being Twitter.
Our portfolio works surprisingly well!