简体   繁体   中英

Using sklearn for multiple linear regression

I have a timeseries that looks like this:

        date       var1     var2       var3       var4       var5      var6
0 2004-09-30   6.252216  10.502101  4.965370  26.828754  3.321060  2.723686   
1 2004-10-29   6.861840   9.776618  4.719399  27.621344  2.281346  4.449510   
2 2004-11-30   8.171250  10.704045  4.949747  30.259377  2.064655  2.843745   
3 2004-12-31   9.702585  11.371383  5.422177  33.578991 -1.008974  2.768579   
4 2005-01-31  12.064022  10.628460  6.390097  35.135098 -0.385921  3.244204   

I want to use sklearn's linear regression function to calculate the slope, y-intercept, and error (r-squared) on this timeseries. Note that all these values are already normalized via my own function, and there is no need for me to use sklearn's normalize parameter.

This is my code so far to do the regression on one column:

reg.fit(df.date.values.reshape(-1, 1), df.var1.values.reshape(-1, 1))
alpha = reg.intercept_[0]
beta = reg.coef_[0][0]
error = reg.score(df.date.values.reshape(-1, 1), df.var1.values.reshape(-1, 1))
values = {"alpha":alpha, "beta":beta, "error": error}

My issue is that I don't know how to do the regression considering every column at once. On top of that, the R^2 calculation does not work.

R^2 aside, my slope & intercept for some individual column is incredible small:

{'beta': -3.205305722098675e-17, 'alpha': 43.05076221170246}

How would I address these issues?

Instead of

df.var1.values.reshape(-1, 1)

Just pass

df.drop('date', axis=1)  # .values should be optional here also

in its place.

This gives you df with all columns excluding date .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM