简体   繁体   中英

sklearn gives unexpected r2 score

I have built a model with linear regression and I would like to calculate r2 score based on the output I have. However the result is really unexpected:

As you can see below, the pearson correlation between y and y hat is positive which means the r2 score should be at least positive. However the result I got from sklearn is negative. How come? Thanks in advance!


import numpy as np
from sklearn.metrics import r2_score
from scipy.stats import pearsonr

y = np.array([ 5.2       ,  1.144     ,  3.3       ,  5.59741373,  1.438     ,       7.562     ,  2.7       ,  0.22706035,  2.204     ,  2.396     ,
        4.314     , 12.51420331, 10.8       , 10.638     ,  5.101     ,
        3.861     ,  3.2       ,  3.8       ,  7.072     , -0.4597798 ,
       -0.9       ,  0.3       , -3.54      , -0.4       , -3.        ,
        0.7       ,  1.3       ,  1.5       ,  6.        ,  2.8       ,
        2.        ,  3.122     ])

y_hat = np.array([ 1.25131326,  2.64864629,  1.56201996,  4.26699994,  2.21499358,
        0.59113701,  2.40848854,  0.14954989,  0.45800824,  2.82399621,
        2.48736001,  2.78476975,  1.36378354,  3.4889863 ,  2.4226333 ,
        2.63939523,  4.15008518,  2.61525276,  2.29859288, -1.4358969 ,
       -3.67752652, -3.73173215, -2.67027158,  0.35012302,  3.91349371,
        5.11971861,  5.96586311,  3.36520449,  0.5204047 ,  1.584193  ,
       -0.05781178,  1.75957967])

pearsonr(y, y_hat) # This gives around 0.299
r2_score(y, y_hat) # This gives -0.18478241562914666

I think I know what is going on here. Basically I naively thought positive correlation would lead to positive r square but this is not the case. By calculating the mean square error of y_hat vs y and y_avg vs y I realize that y_hat is indeed a worse estimator compared to always just predicting the average.

http://www.fairlynerdy.com/what-is-r-squared/

Take a look at this graph from the link above and you can see that even if two series are moving in the same direction, the distance caused by the intercept would make the performance measured by MSE really bad在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM