sklearn gives unexpected r2 score

Question

I have built a model with linear regression and I would like to calculate r2 score based on the output I have. However the result is really unexpected:

As you can see below, the pearson correlation between y and y hat is positive which means the r2 score should be at least positive. However the result I got from sklearn is negative. How come? Thanks in advance!


import numpy as np
from sklearn.metrics import r2_score
from scipy.stats import pearsonr

y = np.array([ 5.2       ,  1.144     ,  3.3       ,  5.59741373,  1.438     ,       7.562     ,  2.7       ,  0.22706035,  2.204     ,  2.396     ,
        4.314     , 12.51420331, 10.8       , 10.638     ,  5.101     ,
        3.861     ,  3.2       ,  3.8       ,  7.072     , -0.4597798 ,
       -0.9       ,  0.3       , -3.54      , -0.4       , -3.        ,
        0.7       ,  1.3       ,  1.5       ,  6.        ,  2.8       ,
        2.        ,  3.122     ])

y_hat = np.array([ 1.25131326,  2.64864629,  1.56201996,  4.26699994,  2.21499358,
        0.59113701,  2.40848854,  0.14954989,  0.45800824,  2.82399621,
        2.48736001,  2.78476975,  1.36378354,  3.4889863 ,  2.4226333 ,
        2.63939523,  4.15008518,  2.61525276,  2.29859288, -1.4358969 ,
       -3.67752652, -3.73173215, -2.67027158,  0.35012302,  3.91349371,
        5.11971861,  5.96586311,  3.36520449,  0.5204047 ,  1.584193  ,
       -0.05781178,  1.75957967])

pearsonr(y, y_hat) # This gives around 0.299
r2_score(y, y_hat) # This gives -0.18478241562914666

Answer 1

I think I know what is going on here. Basically I naively thought positive correlation would lead to positive r square but this is not the case. By calculating the mean square error of y_hat vs y and y_avg vs y I realize that y_hat is indeed a worse estimator compared to always just predicting the average.

http://www.fairlynerdy.com/what-is-r-squared/

Take a look at this graph from the link above and you can see that even if two series are moving in the same direction, the distance caused by the intercept would make the performance measured by MSE really bad

sklearn gives unexpected r2 score

Question

1 answers

solution1
1 ACCPTED 2019-08-30 15:46:47

sklearn gives unexpected r2 score

Question

1 answers

solution1 1 ACCPTED 2019-08-30 15:46:47

solution1
1 ACCPTED 2019-08-30 15:46:47