一个好的模型可以有一个低的 R 平方值吗？

Question

I made linear regression using scikit learn我使用 scikit learn 进行了线性回归

when I see my mean squared error on the test data then it's very low (0.09)当我在测试数据上看到我的均方误差时，它非常低（0.09）

when I see my r2_score on my test data then it's also very less (0.05)当我在测试数据上看到我的 r2_score 时，它也非常小（0.05）

as per i know when mean squared error is low that present model is good but r2_score is very less that tells us model is not good据我所知，当均方误差很低时，当前模型很好，但 r2_score 非常小，这告诉我们模型不好

I don't understand that my regression model is good or not我不明白我的回归模型好不好

Can a good model has a low R square value or can a bad model has a low mean square error value?一个好的模型是否可以具有较低的 R 平方值，或者一个坏模型是否可以具有较低的均方误差值？

Answer 1

R^2 is measure of, how good your fit is representing the data. R^2 是衡量您的拟合代表数据的程度。

Let's say your data has a linear trend and some noise on it.假设您的数据有一个线性趋势和一些噪音。 We can construct the data and see how the R^2 is changing:我们可以构建数据，看看 R^2 是如何变化的：

Data数据

I'm going to create some data using numpy :我将使用numpy创建一些数据：

xs = np.random.randint(10, 1000, 2000)
ys = (3 * xs + 8) + np.random.randint(5, 10, 2000)

Fit合身

Now we can create a fit object usinh scikit现在我们可以使用 scikit 创建一个合适的对象

reg = LinearRegression().fit(xs.reshape(-1, 1), ys.reshape(-1, 1))

And we can get the score from this fit.我们可以从这个拟合中得到分数。

reg.score(xs.reshape(-1, 1), ys.reshape(-1, 1))

My R^2 was: 0.9999971914416896我的 R^2 是： 0.9999971914416896

Bad data坏数据

Let's say we have a set of more scattered data (have more noise on it).假设我们有一组更分散的数据（上面有更多的噪音）。

ys2 = (3 * xs + 8) + np.random.randint(500, 1000, 2000)

Now we can calculate the score of the ys2 to understand how good our fit represent the xs , ys2 data:现在我们可以计算ys2的分数以了解我们的拟合代表xs和ys2数据的程度：

reg.score(xs.reshape(-1, 1), ys2.reshape(-1, 1))

My R^2 was: 0.2377175028951054我的 R^2 是： 0.2377175028951054

The score is low.分数很低。 we know the trend of the data did not change.我们知道数据的趋势没有改变。 It still is 3x+8 + (noise).它仍然是 3x+8 +（噪声）。 But ys2 are further away from the fit.但是ys2离拟合更远。

So, R^2 is an inductor of how good your fit is representing the data.因此，R^2 是您的拟合代表数据的电感器。 But the condition of the data itself is important.但数据本身的状况很重要。 Maybe even with low score the best possible fit is what you get.也许即使分数很低，你得到的也是最合适的。 Since the data is scattered due to noise.由于数据由于噪声而分散。

一个好的模型可以有一个低的 R 平方值吗？

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-10-27 05:38:00

Data数据

Fit合身

Bad data坏数据

一个好的模型可以有一个低的 R 平方值吗？

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-10-27 05:38:00

Data数据

Fit合身

Bad data坏数据

解决方案1
1 已采纳 2021-10-27 05:38:00