为什么在使用 sklearn R2 函数时会得到 nan？

Question

I am always predicting the next value with an sklearn model.我总是用sklearn模型预测下一个值。

y1_test, y2_test, y3_test, y4_test = get_test_targets(df)

ypred1, ypred2, ypred3, ypred4 = ml_model(df, ElasticNet())

I would like to use sklearn to measure the r2 score of the y_true and y_predicted .我想使用 sklearn 来测量y_true和y_predicted的 r2 分数。

np.array([y2_test])
>> array([6.75233645])

np.array([ypred2[0]])
array([6.75233645])

Using r2_score(np.array([y2_test]), np.array([ypred2[0]])) gives nan使用r2_score(np.array([y2_test]), np.array([ypred2[0]]))给出nan

I do not understand why I am getting nan我不明白为什么我越来越nan

Answer 1

There is a warning telling you what is wrong:有一个警告告诉你出了什么问题：

import numpy as np
from sklearn.metrics import r2_score

x = np.array([2.3])
y = np.array([2.1]) # exact values do not matter

r2_score(x, y)

Result:结果：

UndefinedMetricWarning: R^2 score is not well-defined with less than two samples.
  warnings.warn(msg, UndefinedMetricWarning)

nan

This should not be a surprise: the definition of R^2 is这应该不足为奇：R^2 的定义是

R^2 = 1 - (total_sum_squares)/(residual_sum_squares)

but with only one sample both the nominator and the denominator of the fraction are 0, leading to a 0/0 division, which is indeed a nan (computationally, as well as mathematically).但是只有一个样本，分数的分母和分母都是 0，导致0/0除法，这确实是nan （在计算上和数学上）。

Bottom line: you should not use only a single pair of data to compute R^2;底线：你不应该只使用一对数据来计算 R^2； batch together more pairs of predictions & ground truth samples in order to get meaningful R^2 results.将更多对预测和真实样本组合在一起，以获得有意义的 R^2 结果。

为什么在使用 sklearn R2 函数时会得到 nan？

问题描述

1 个解决方案

解决方案1
2 已采纳 2020-10-27 11:46:40

为什么在使用 sklearn R2 函数时会得到 nan？

问题描述

1 个解决方案

解决方案1 2 已采纳 2020-10-27 11:46:40

解决方案1
2 已采纳 2020-10-27 11:46:40