Using sklearn for multiple linear regression

Question

I have a timeseries that looks like this:

        date       var1     var2       var3       var4       var5      var6
0 2004-09-30   6.252216  10.502101  4.965370  26.828754  3.321060  2.723686   
1 2004-10-29   6.861840   9.776618  4.719399  27.621344  2.281346  4.449510   
2 2004-11-30   8.171250  10.704045  4.949747  30.259377  2.064655  2.843745   
3 2004-12-31   9.702585  11.371383  5.422177  33.578991 -1.008974  2.768579   
4 2005-01-31  12.064022  10.628460  6.390097  35.135098 -0.385921  3.244204

I want to use sklearn's linear regression function to calculate the slope, y-intercept, and error (r-squared) on this timeseries. Note that all these values are already normalized via my own function, and there is no need for me to use sklearn's normalize parameter.

This is my code so far to do the regression on one column:

reg.fit(df.date.values.reshape(-1, 1), df.var1.values.reshape(-1, 1))
alpha = reg.intercept_[0]
beta = reg.coef_[0][0]
error = reg.score(df.date.values.reshape(-1, 1), df.var1.values.reshape(-1, 1))
values = {"alpha":alpha, "beta":beta, "error": error}

My issue is that I don't know how to do the regression considering every column at once. On top of that, the R^2 calculation does not work.

R^2 aside, my slope & intercept for some individual column is incredible small:

{'beta': -3.205305722098675e-17, 'alpha': 43.05076221170246}

How would I address these issues?

Answer 1

Instead of

df.var1.values.reshape(-1, 1)

Just pass

df.drop('date', axis=1)  # .values should be optional here also

in its place.

This gives you df with all columns excluding date .

Using sklearn for multiple linear regression

Question

1 answers

solution1
1 ACCPTED 2018-03-02 21:11:51

Using sklearn for multiple linear regression

Question

1 answers

solution1 1 ACCPTED 2018-03-02 21:11:51

solution1
1 ACCPTED 2018-03-02 21:11:51