[英]Boxplot of Linear Regression Model with several Dummy coded predictors in R
I have the following linear model: 我有以下线性模型:
model <- lm(var01 ~ a0 + a1 + a2 + a3 + a4 + a5,NT)
Where var01 is a intervall-scaled variable from 0-100 and a0-a5 are dummy coded (0, 1) variables. 其中var01是介于0到100之间的按比例缩放的变量,而a0-a5是伪编码的(0,1)变量。 The summary(model) gives this:
摘要(模型)给出以下内容:
Residuals:
Min 1Q Median 3Q Max
-75.951 -13.469 -7.239 18.795 80.531
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 59.6015 8.7076 6.845 5.48e-10 ***
a01 -46.1329 8.6302 -5.345 5.37e-07 ***
a11 -0.8744 9.0549 -0.097 0.9233
a21 22.0408 9.1278 2.415 0.0175 *
a31 9.5488 9.9284 0.962 0.3384
a41 14.9227 7.6762 1.944 0.0546 .
a51 -8.1222 11.8530 -0.685 0.4947
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 32.13 on 104 degrees of freedom
Multiple R-squared: 0.4393, Adjusted R-squared: 0.407
F-statistic: 13.58 on 6 and 104 DF, p-value: 2.486e-11
I would like to create a boxplot where a0-a5 are displayed next to each other, but only with a0==1,a1==1, etc. 我想创建一个箱形图,其中a0-a5彼此相邻显示,但只有a0 == 1,a1 == 1等。
So I tried: 所以我尝试了:
ggplot(NT, aes(factor(a0), var01)) +
geom_boxplot() +
geom_smooth(method = "lm", se=FALSE, color="black", aes(group=1))
But this shows the boxplots for a0 == 0 and a0 == 1 next to each other. 但这显示了a0 == 0和a0 == 1彼此相邻的箱线图。 So two questions: How do I get R only to show a0 == 1?
有两个问题:如何仅显示a0 == 1来获得R? And furthermore all four other predictors a1-a5 next to a0 (but also limited to a1-a4 == 1) in the same graphic?
此外,在同一图形中,a0旁边的所有其他四个预测变量a1-a5(但也限于a1-a4 == 1)?
Help is very appreciated. 非常感谢您的帮助。 Thanks :)
谢谢 :)
Update: Sample data 更新:样本数据
id category_a var01 a0 a1 a2 a3 a4 a5
3 1;5 100 0 1 0 0 0 1
4 1;5 0 0 1 0 0 0 1
5 0 21 1 0 0 0 0 0
6 1;2;4 100 0 1 1 0 1 0
9 1;2 68 0 1 1 0 0 0
So a0-a5 are dummy codings of multi-category variable "category_a". 因此,a0-a5是多类别变量“ category_a”的伪编码。
It's a question of data-reshaping. 这是数据重塑的问题。 ggplot works best if each data-point you're interested in is one row in a dataframe (long format).
如果您感兴趣的每个数据点在数据帧(长格式)中都是一行,则ggplot效果最佳。
library(ggplot2)
library(reshape2)
#generate data
set.seed(1)
n=1000
NT <- data.frame(id=1:n,
var01=rnorm(n),
a0=rbinom(n,1,0.2),
a1=rbinom(n,1,0.2),
a2=rbinom(n,1,0.2),
a3=rbinom(n,1,0.2),
a4=rbinom(n,1,0.2),
a5=rbinom(n,1,0.2))
#do some data-reshaping before plotting
#ggplot needs each data-point on one line
#so transform to long
plotdata <- melt(NT,id.vars=c("id","var01"),variable.name="a")
Now it's very easy to plot everything: 现在很容易绘制所有内容:
#plot everything using interaction
p1 <- ggplot(plotdata, aes(x=interaction(a,value), y=var01)) +
geom_boxplot()
p1
Or a selection: 或选择:
p2 <- ggplot(plotdata[plotdata$value==1,],
aes(x=a, y=var01)) +
geom_boxplot()
p2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.