简体   繁体   English

从R中的多元回归绘制“回归线”

[英]Plot “regression line” from multiple regression in R

I ran a multiple regression with several continuous predictors, a few of which came out significant, and I'd like to create a scatterplot or scatter-like plot of my DV against one of the predictors, including a "regression line". 我使用了几个连续的预测变量进行了多元回归,其中一些预测变得非常重要,我想创建一个散点图或散点图,我的DV对照其中一个预测变量,包括“回归线”。 How can I do this? 我怎样才能做到这一点?

My plot looks like this 我的情节看起来像这样

D = my.data; plot( D$probCategorySame, D$posttestScore )

If it were simple regression, I could add a regression line like this: 如果是简单回归,我可以添加如下的回归线:

lmSimple <- lm( posttestScore ~ probCategorySame, data=D )
abline( lmSimple ) 

But my actual model is like this: 但我的实际模型是这样的:

lmMultiple <- lm( posttestScore ~ pretestScore + probCategorySame + probDataRelated + practiceAccuracy + practiceNumTrials, data=D )

I would like to add a regression line that reflects the coefficient and intercept from the actual model instead of the simplified one. 我想添加一个回归线,它反映了实际模型中的系数和截距,而不是简化模型。 I think I'd be happy to assume mean values for all other predictors in order to do this, although I'm ready to hear advice to the contrary. 我想我很乐意为所有其他预测因子假设平均值,尽管我已经准备好听到相反的建议。

This might make no difference, but I'll mention just in case, the situation is complicated slightly by the fact that I probably will not want to plot the original data. 这可能没什么区别,但我会提到以防万一,由于我可能不想绘制原始数据这一事实,情况略有复杂。 Instead, I'd like to plot mean values of the DV for binned values of the predictor, like so: 相反,我想将DV的平均值绘制为预测变量的分箱值,如下所示:

D[,'probCSBinned'] = cut( my.data$probCategorySame, as.numeric( seq( 0,1,0.04 ) ), include.lowest=TRUE, right=FALSE, labels=FALSE )
D = aggregate( posttestScore~probCSBinned, data=D, FUN=mean )
plot( D$probCSBinned, D$posttestScore )

Just because it happens to look much cleaner for my data when I do it this way. 仅仅因为当我这样做时,我的数据看起来更加清晰。

To plot the individual terms in a linear or generalised linear model (ie, fit with lm or glm ), use termplot . 要在线性或广义线性模型中绘制单个项(即,与lmglm拟合),请使用termplot No need for binning or other manipulation. 无需分级或其他操作。

# plot everything on one page
par(mfrow=c(2,3))
termplot(lmMultiple)

# plot individual term
par(mfrow=c(1,1))
termplot(lmMultiple, terms="preTestScore")

You need to create a vector of x-values in the domain of your plot and predict their corresponding y-values from your model. 您需要在绘图域中创建x值的向量,并从模型中预测其对应的y值。 To do this, you need to inject this vector into a dataframe comprised of variables that match those in your model. 为此,您需要将此向量注入到由与模型中的变量匹配的变量组成的数据框中。 You stated that you are OK with keeping the other variables fixed at their mean values, so I have used that approach in my solution. 你声明你可以将其他变量固定在它们的平均值上,所以我在我的解决方案中使用了这种方法。 Whether or not the x-values you are predicting are actually legal given the other values in your plot should probably be something you consider when setting this up. 考虑到图中的其他值,您预测的x值是否实际合法应该是您在设置时考虑的因素。

Without sample data I can't be sure this will work exactly for you, so I apologize if there are any bugs below, but this should at least illustrate the approach. 如果没有样本数据,我无法确定这对您是否正常工作,所以如果下面有任何错误我会道歉,但至少应该说明这种方法。

# Setup
xmin = 0; xmax=10 # domain of your plot
D = my.data
plot( D$probCategorySame, D$posttestScore, xlim=c(xmin,xmax) )
lmMultiple <- lm( posttestScore ~ pretestScore + probCategorySame + probDataRelated + practiceAccuracy + practiceNumTrials, data=D )

# create a dummy dataframe where all variables = their mean value for each record
# except the variable we want to plot, which will vary incrementally over the 
# domain of the plot. We need this object to get the predicted values we
# want to plot.
N=1e4
means = colMeans(D)
dummyDF = t(as.data.frame(means))
for(i in 2:N){dummyDF=rbind(dummyDF,means)} # There's probably a more elegant way to do this.
xv=seq(xmin,xmax, length.out=N)
dummyDF$probCSBinned = xv 
# if this gives you a warning about "Coercing LHS to list," use bracket syntax:
#dummyDF[,k] = xv # where k is the column index of the variable `posttestScore`

# Getting and plotting predictions over our dummy data.
yv=predict(lmMultiple, newdata=subset(dummyDF, select=c(-posttestScore)))
lines(xv, yv)

查看TeachingDemos包中的Predict.Plot函数,可以选择一个选项来绘制一个预测变量与其他预测变量给定值的响应。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM