简体   繁体   English

Python sci-kit学习(度量):r2_score和explained_variance_score之间的区别?

[英]Python sci-kit learn (metrics): difference between r2_score and explained_variance_score?

I noticed that that r2_score and explained_variance_score are both build-in sklearn.metrics methods for regression problems.我注意到,该r2_scoreexplained_variance_score都内建sklearn.metrics回归问题的方法。

I was always under the impression that r2_score is the percent variance explained by the model.我一直认为r2_score是模型解释的百分比方差。 How is it different from explained_variance_score ?它与explained_variance_score有何不同?

When would you choose one over the other?你什么时候会选择一个?

Thanks!谢谢!

Most of the answers I found (including here) emphasize on the difference between R 2 and Explained Variance Score , that is: The Mean Residue (ie The Mean of Error).我发现的大多数答案(包括此处)都强调R 2Explained Variance Score之间的区别,即:平均残差(即误差均值)。

However, there is an important question left behind, that is: Why on earth I need to consider The Mean of Error?然而,留下了一个重要的问题,那就是:我到底为什么要考虑误差均值?


Refresher:复习:

R 2 : is the Coefficient of Determination which measures the amount of variation explained by the (least-squares) Linear Regression. R 2 :是决定系数,它测量由(最小二乘法)线性回归解释的变异量。

You can look at it from a different angle for the purpose of evaluating the predicted values of y like this:为了评估y预测值,您可以从不同的角度查看它,如下所示:

Variance actual_y × R 2 actual_y = Variance predicted_y方差实际_y × R 2实际_y =方差预测_y

So intuitively, the more R 2 is closer to 1 , the more actual_y and predicted_y will have same variance ( ie same spread )所以直观地,更多的R 2为更接近1 ,越actual_y和predicted_y将具有相同方差(即相同的扩展


As previously mentioned, the main difference is the Mean of Error ;如前所述,主要区别在于均值误差 and if we look at the formulas, we find that's true:如果我们查看公式,我们会发现这是真的:

R2 = 1 - [(Sum of Squared Residuals / n) / Variancey_actual]

Explained Variance Score = 1 - [Variance(Ypredicted - Yactual) / Variancey_actual]

in which:其中:

Variance(Ypredicted - Yactual) = (Sum of Squared Residuals - Mean Error) / n 

So, obviously the only difference is that we are subtracting the Mean Error from the first formula!所以,显然唯一的区别是我们从第一个公式中减去了平均误差 ... But Why? ......但是为什么?


When we compare the R 2 Score with the Explained Variance Score , we are basically checking the Mean Error ;当我们将R 2 ScoreExplained Variance Score 进行比较时,我们基本上是在检查平均误差 so if R 2 = Explained Variance Score, that means: The Mean Error = Zero !所以如果 R 2 = 解释方差分数,那意味着:平均误差 =

The Mean Error reflects the tendency of our estimator, that is: the Biased vs Unbiased Estimation .平均误差反映了我们的估计量的趋势,即:有偏估计与无偏估计


In Summary:总之:

If you want to have unbiased estimator so our model is not underestimating or overestimating, you may consider taking Mean of Error into account.如果您想拥有无偏估计量,以便我们的模型不会低估或高估,您可以考虑考虑均值误差

OK, look at this example:好的,看这个例子:

In [123]:
#data
y_true = [3, -0.5, 2, 7]
y_pred = [2.5, 0.0, 2, 8]
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.957173447537
0.948608137045
In [124]:
#what explained_variance_score really is
1-np.cov(np.array(y_true)-np.array(y_pred))/np.cov(y_true)
Out[124]:
0.95717344753747324
In [125]:
#what r^2 really is
1-((np.array(y_true)-np.array(y_pred))**2).sum()/(4*np.array(y_true).std()**2)
Out[125]:
0.94860813704496794
In [126]:
#Notice that the mean residue is not 0
(np.array(y_true)-np.array(y_pred)).mean()
Out[126]:
-0.25
In [127]:
#if the predicted values are different, such that the mean residue IS 0:
y_pred=[2.5, 0.0, 2, 7]
(np.array(y_true)-np.array(y_pred)).mean()
Out[127]:
0.0
In [128]:
#They become the same stuff
print metrics.explained_variance_score(y_true, y_pred)
print metrics.r2_score(y_true, y_pred)
0.982869379015
0.982869379015

So, when the mean residue is 0, they are the same.因此,当平均残差为 0 时,它们是相同的。 Which one to choose dependents on your needs, that is, is the mean residue suppose to be 0?选择哪一个取决于您的需要,即平均残差假设为0?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 scikit-learn-explained_variance_score - scikit-learn - explained_variance_score KNeighborsRegressor 的 score() 和 sklearn.metrics 的 r2_score 有什么区别 - what is the difference between score() of KNeighborsRegressor and r2_score of sklearn.metrics 线性回归中 r2_score 和 score() 的区别 - Difference between r2_score and score() in linear regression Sci-kit学习KDE差异增加1 - Sci-kit Learn KDE variance increases by one 在sci-kit learning中使用libSVM或在R中使用e1070进行培训与使用支持向量机有什么区别? - What's the difference between using libSVM in sci-kit learn, or e1070 in R, for training and using support vector machines? scikit-learn r2_score 中的 multioutput='variance_weighted':它计算什么? - multioutput='variance_weighted' in scikit-learn r2_score: what does it calculate? sci-kit 学习 PCA 和手动 PCA 的结果差异 - Difference in result for sci-kit learn PCA and manual PCA Python:sci-kit中的特征选择可以学习正态分布 - Python: feature selection in sci-kit learn for a normal distribution `scikit-learn` 的 `r2_score` 与 R^2 计算之间的显着不匹配 - Significant mismatch between `r2_score` of `scikit-learn` and the R^2 calculation Python 中的随机森林 [r2_score 中的错误] - Random Forest In Python [Error in r2_score]
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM