[英]why do I get nan when using sklearn R2 function?
I am always predicting the next value with an sklearn
model.我总是用
sklearn
模型预测下一个值。
y1_test, y2_test, y3_test, y4_test = get_test_targets(df)
ypred1, ypred2, ypred3, ypred4 = ml_model(df, ElasticNet())
I would like to use sklearn to measure the r2 score of the y_true
and y_predicted
.我想使用 sklearn 来测量
y_true
和y_predicted
的 r2 分数。
np.array([y2_test])
>> array([6.75233645])
np.array([ypred2[0]])
array([6.75233645])
Using r2_score(np.array([y2_test]), np.array([ypred2[0]]))
gives nan
使用
r2_score(np.array([y2_test]), np.array([ypred2[0]]))
给出nan
I do not understand why I am getting nan
我不明白为什么我越来越
nan
There is a warning telling you what is wrong:有一个警告告诉你出了什么问题:
import numpy as np
from sklearn.metrics import r2_score
x = np.array([2.3])
y = np.array([2.1]) # exact values do not matter
r2_score(x, y)
Result:结果:
UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
warnings.warn(msg, UndefinedMetricWarning)
nan
This should not be a surprise: the definition of R^2 is这应该不足为奇:R^2 的定义是
R^2 = 1 - (total_sum_squares)/(residual_sum_squares)
but with only one sample both the nominator and the denominator of the fraction are 0, leading to a 0/0
division, which is indeed a nan
(computationally, as well as mathematically).但是只有一个样本,分数的分母和分母都是 0,导致
0/0
除法,这确实是nan
(在计算上和数学上)。
Bottom line: you should not use only a single pair of data to compute R^2;底线:你不应该只使用一对数据来计算 R^2; batch together more pairs of predictions & ground truth samples in order to get meaningful R^2 results.
将更多对预测和真实样本组合在一起,以获得有意义的 R^2 结果。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.