简体   繁体   English

lines()无法正确显示二次拟合

[英]lines() not properly displaying quadratic fit

I'm simply trying to display the fit I've generated using lm(), but the lines function is giving me a weird result in which there are multiple lines coming out of one point. 我只是试图显示我使用lm()生成的拟合,但是lines函数给我一个奇怪的结果,其中有多条线从一个点出来。

Here is my code: 这是我的代码:

library(ISLR)
data(Wage)
lm.mod<-lm(wage~poly(age, 4), data=Wage)
Wage$lm.fit<-predict(lm.mod, Wage)

plot(Wage$age, Wage$wage)
lines(Wage$age, Wage$lm.fit, col="blue")

I've tried resetting my plot with dev.off(), but I've had no luck. 我曾尝试用dev.off()重置剧情,但我没有运气。 I'm using rStudio. 我正在使用rStudio。 FWIW, the line shows up perfectly fine if I make the regression linear only, but as soon as I make it quadratic or higher (using I(age^2) or poly()), I get a weird graph. FWIW,如果仅使回归线性化,则该线显示得非常好,但是一旦将其变为二次方或更高(使用I(age ^ 2)或poly()),就会得到一个奇怪的图。 Also, the points() function works fine with poly(). 而且,points()函数可以与poly()一起使用。

Thanks for the help. 谢谢您的帮助。

Because you forgot to order the points by age first, the lines are going to random ages. 因为您忘了先按年龄排序点,所以这些行将随机出现。 This is happening for the linear regression too; 线性回归也是如此。 he reason it works for lines is because traveling along any set of points along a line...stays on the line! 他之所以适用于直线,是因为沿着直线上的任何点集行驶……停留在直线上!

plot(Wage$age, Wage$wage)
lines(sort(Wage$age), Wage$lm.fit[order(Wage$age)], col = 'blue')

在此处输入图片说明

Consider increasing the line width for a better view: 考虑增加线宽以获得更好的视图:

lines(sort(Wage$age), Wage$lm.fit[order(Wage$age)], col = 'blue', lwd = 3)

在此处输入图片说明

Just to add another more general tip on plotting model predictions: 只是为了在绘制模型预测时添加另一个更通用的技巧:

An often used strategy is to create a new data set (eg newdat ) which contains a sequence of values for your predictor variables across a range of possible values. 经常使用的策略是创建一个新数据集(例如newdat ),其中包含一系列可能值中的预测变量的值序列。 Then use this data to show your predicted values. 然后使用此数据显示您的预测值。 In this data set, you have a good spread of predictor variable values, but this may not always be the case. 在此数据集中,您可以很好地分散预测变量值,但是并非总是如此。 With the new data set, you can ensure that your line represents evenly distributed values across the variable's range: 使用新的数据集,可以确保您的行代表变量范围内的均匀分布的值:

Example

newdat <- data.frame(age=seq(min(Wage$age), max(Wage$age),length=1000))
newdat$pred <- predict(lm.mod, newdata=newdat)
plot(Wage$age, Wage$wage, col=8, ylab="Wage", xlab="Age")
lines(newdat$age, newdat$pred, col="blue", lwd=2)

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM