[英]Can a good model have a low R square value?
I made linear regression using scikit learn我使用 scikit learn 进行了线性回归
when I see my mean squared error on the test data then it's very low (0.09)当我在测试数据上看到我的均方误差时,它非常低(0.09)
when I see my r2_score on my test data then it's also very less (0.05)当我在测试数据上看到我的 r2_score 时,它也非常小(0.05)
as per i know when mean squared error is low that present model is good but r2_score is very less that tells us model is not good据我所知,当均方误差很低时,当前模型很好,但 r2_score 非常小,这告诉我们模型不好
I don't understand that my regression model is good or not我不明白我的回归模型好不好
Can a good model has a low R square value or can a bad model has a low mean square error value?一个好的模型是否可以具有较低的 R 平方值,或者一个坏模型是否可以具有较低的均方误差值?
R^2 is measure of, how good your fit is representing the data. R^2 是衡量您的拟合代表数据的程度。
Let's say your data has a linear trend and some noise on it.假设您的数据有一个线性趋势和一些噪音。 We can construct the data and see how the R^2 is changing:
我们可以构建数据,看看 R^2 是如何变化的:
I'm going to create some data using numpy
:我将使用
numpy
创建一些数据:
xs = np.random.randint(10, 1000, 2000)
ys = (3 * xs + 8) + np.random.randint(5, 10, 2000)
Now we can create a fit object usinh scikit现在我们可以使用 scikit 创建一个合适的对象
reg = LinearRegression().fit(xs.reshape(-1, 1), ys.reshape(-1, 1))
And we can get the score from this fit.我们可以从这个拟合中得到分数。
reg.score(xs.reshape(-1, 1), ys.reshape(-1, 1))
My R^2 was: 0.9999971914416896
我的 R^2 是:
0.9999971914416896
Let's say we have a set of more scattered data (have more noise on it).假设我们有一组更分散的数据(上面有更多的噪音)。
ys2 = (3 * xs + 8) + np.random.randint(500, 1000, 2000)
Now we can calculate the score of the ys2
to understand how good our fit represent the xs
, ys2
data:现在我们可以计算
ys2
的分数以了解我们的拟合代表xs
和ys2
数据的程度:
reg.score(xs.reshape(-1, 1), ys2.reshape(-1, 1))
My R^2 was: 0.2377175028951054
我的 R^2 是:
0.2377175028951054
The score is low.分数很低。 we know the trend of the data did not change.
我们知道数据的趋势没有改变。 It still is 3x+8 + (noise).
它仍然是 3x+8 +(噪声)。 But
ys2
are further away from the fit.但是
ys2
离拟合更远。
So, R^2 is an inductor of how good your fit is representing the data.因此,R^2 是您的拟合代表数据的电感器。 But the condition of the data itself is important.
但数据本身的状况很重要。 Maybe even with low score the best possible fit is what you get.
也许即使分数很低,你得到的也是最合适的。 Since the data is scattered due to noise.
由于数据由于噪声而分散。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.