I am following a course in econometrics but I'm stuck unfortunately.
I hope you can help me.
The following model is given:
https://i.stack.imgur.com/DfYCN.png
The OLS estimator of beta is given by: https://i.stack.imgur.com/r7bHD.png
But when I run the following python script with very large sample size the estimators are terrible and do not converge to the true values. Could anyone explain this to me please?
'''
n = 100000
beta1 = 5.
beta2 = -.02
beta3 = .2
constant_term = np.ones(n)
X1 = np.linspace(10,30,n)
X2 = np.linspace(0,10,n)
X = np.column_stack((constant_term, X1, X2))
Y = np.zeros(n)
for i in range(n):
u = np.random.normal(0.,1.)
Y[i] = beta1 + beta2 * X[i,1] + beta3 * X[i,2] + u
Xt = np.transpose(X)
beta_ols = np.linalg.inv(Xt @ X) @ Xt @ Y
print(beta_ols)
''' It returns for example [ 4.66326351 -0.32281745 0.87127398] but the true values are [5., -.02, .2]
I am aware that there also are function that can do this for me, but I want to do it manually to understand the material better.
Thanks!
You variables X
and X2
are collinear, ie not linearly independent. Hence you matrix Xt @ X
is not of full rank. Eigevalues:
np.linalg.eig(Xt @ X)[0]
prints
[4.65788929e+07, 3.72227442e-11, 1.87857084e+05]
note the second one is basically 0. Not exactly zero due to rounding etc. But when you invert this matrix you essentially divide by this very small number and massively lose precision. There are many ways to address it, for example look up Tikhonov regularization . In Python you can use Ridge
regression from sklearn-kit
Of course if you do not want to get into finer details you can just modify your code to make sure your two variables are linearly independent, eg you can replace X2
initialization with
X2 = np.linspace(0,10,n)**2
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.