I have a timeseries that looks like this:
date var1 var2 var3 var4 var5 var6
0 2004-09-30 6.252216 10.502101 4.965370 26.828754 3.321060 2.723686
1 2004-10-29 6.861840 9.776618 4.719399 27.621344 2.281346 4.449510
2 2004-11-30 8.171250 10.704045 4.949747 30.259377 2.064655 2.843745
3 2004-12-31 9.702585 11.371383 5.422177 33.578991 -1.008974 2.768579
4 2005-01-31 12.064022 10.628460 6.390097 35.135098 -0.385921 3.244204
I want to use sklearn's linear regression function to calculate the slope, y-intercept, and error (r-squared) on this timeseries. Note that all these values are already normalized via my own function, and there is no need for me to use sklearn's normalize parameter.
This is my code so far to do the regression on one column:
reg.fit(df.date.values.reshape(-1, 1), df.var1.values.reshape(-1, 1))
alpha = reg.intercept_[0]
beta = reg.coef_[0][0]
error = reg.score(df.date.values.reshape(-1, 1), df.var1.values.reshape(-1, 1))
values = {"alpha":alpha, "beta":beta, "error": error}
My issue is that I don't know how to do the regression considering every column at once. On top of that, the R^2 calculation does not work.
R^2 aside, my slope & intercept for some individual column is incredible small:
{'beta': -3.205305722098675e-17, 'alpha': 43.05076221170246}
How would I address these issues?
Instead of
df.var1.values.reshape(-1, 1)
Just pass
df.drop('date', axis=1) # .values should be optional here also
in its place.
This gives you df
with all columns excluding date
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.