简体   繁体   English

R中具有多个虚拟编码预测变量的线性回归模型的箱线图

[英]Boxplot of Linear Regression Model with several Dummy coded predictors in R

I have the following linear model: 我有以下线性模型:

model <- lm(var01 ~ a0 + a1 + a2 + a3 + a4 + a5,NT)

Where var01 is a intervall-scaled variable from 0-100 and a0-a5 are dummy coded (0, 1) variables. 其中var01是介于0到100之间的按比例缩放的变量,而a0-a5是伪编码的(0,1)变量。 The summary(model) gives this: 摘要(模型)给出以下内容:

Residuals:
    Min      1Q  Median      3Q     Max 
-75.951 -13.469  -7.239  18.795  80.531 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  59.6015     8.7076   6.845 5.48e-10 ***
a01         -46.1329     8.6302  -5.345 5.37e-07 ***
a11          -0.8744     9.0549  -0.097   0.9233    
a21          22.0408     9.1278   2.415   0.0175 *  
a31           9.5488     9.9284   0.962   0.3384    
a41          14.9227     7.6762   1.944   0.0546 .  
a51          -8.1222    11.8530  -0.685   0.4947    
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 32.13 on 104 degrees of freedom
Multiple R-squared:  0.4393,    Adjusted R-squared:  0.407 
F-statistic: 13.58 on 6 and 104 DF,  p-value: 2.486e-11

I would like to create a boxplot where a0-a5 are displayed next to each other, but only with a0==1,a1==1, etc. 我想创建一个箱形图,其中a0-a5彼此相邻显示,但只有a0 == 1,a1 == 1等。

So I tried: 所以我尝试了:

ggplot(NT, aes(factor(a0), var01)) +
  geom_boxplot() +
  geom_smooth(method = "lm", se=FALSE, color="black", aes(group=1))

But this shows the boxplots for a0 == 0 and a0 == 1 next to each other. 但这显示了a0 == 0和a0 == 1彼此相邻的箱线图。 So two questions: How do I get R only to show a0 == 1? 有两个问题:如何仅显示a0 == 1来获得R? And furthermore all four other predictors a1-a5 next to a0 (but also limited to a1-a4 == 1) in the same graphic? 此外,在同一图形中,a0旁边的所有其他四个预测变量a1-a5(但也限于a1-a4 == 1)?

Help is very appreciated. 非常感谢您的帮助。 Thanks :) 谢谢 :)

Update: Sample data 更新:样本数据

id  category_a  var01   a0  a1  a2  a3  a4  a5
3   1;5          100    0   1   0   0   0   1
4   1;5            0    0   1   0   0   0   1
5   0             21    1   0   0   0   0   0
6   1;2;4        100    0   1   1   0   1   0
9   1;2           68    0   1   1   0   0   0

So a0-a5 are dummy codings of multi-category variable "category_a". 因此,a0-a5是多类别变量“ category_a”的伪编码。

It's a question of data-reshaping. 这是数据重塑的问题。 ggplot works best if each data-point you're interested in is one row in a dataframe (long format). 如果您感兴趣的每个数据点在数据帧(长格式)中都是一行,则ggplot效果最佳。

library(ggplot2)
library(reshape2)
#generate data
set.seed(1)
n=1000
NT <- data.frame(id=1:n,
                   var01=rnorm(n),
                   a0=rbinom(n,1,0.2),
                   a1=rbinom(n,1,0.2),
                   a2=rbinom(n,1,0.2),
                   a3=rbinom(n,1,0.2),
                   a4=rbinom(n,1,0.2),
                   a5=rbinom(n,1,0.2))

#do some data-reshaping before plotting
#ggplot needs each data-point on one line
#so transform to long
plotdata <- melt(NT,id.vars=c("id","var01"),variable.name="a")

Now it's very easy to plot everything: 现在很容易绘制所有内容:

#plot everything using interaction
p1 <- ggplot(plotdata, aes(x=interaction(a,value), y=var01)) +
  geom_boxplot()
p1

在此处输入图片说明

Or a selection: 或选择:

p2 <- ggplot(plotdata[plotdata$value==1,], 
             aes(x=a, y=var01)) +
  geom_boxplot()

p2

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过R中线性回归的两个预测变量组合 - By two combinations of predictors in linear regression in R 在R中的面板线性模型(回归)中添加虚拟变量 - Adding dummy variables in panel linear model (regression) in R 线性回归 model 与 R 中的虚拟(因)变量和分类(独立)变量 - Linear Regression model with dummy (dependent) variable and categorical (independent) variable in R 在 R 中对多个处理加对照组进行线性回归的虚拟编码时如何避免虚拟变量陷阱 - How to avoid the dummy variable trap when dummy coding several treatments plus control group for linear regression in R 如何在R中进行多因素回归(一般线性模型)而不预先知道预测变量的数量? - How to make a multi-factorial regression (general linear model) in R without knowing in advance the number of predictors? 随着我们逐步添加预测变量,获取线性回归模型的R平方值列表 - Get list of R-squared values for linear regression model as we incrementally add predictors 在 R 中使用 plm 的没有预测变量的回归模型? - Regression model without predictors using plm in R? R中的线性回归模型 - Linear Regression Model in R statsmodels线性回归-包括模型中所有预测变量的patsy公式 - statsmodels linear regression - patsy formula to include all predictors in model 在R中拟合线性回归模型 - Fitting a linear regression model in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM