简体   繁体   English

降低水平以将其中 2 个视为控制案例。 回归/建模/统计问题,因为它不是虚拟的?

[英]Drop levels to treat 2 of them as a Control Case. Problems with regression/modelling/statistics since its not dummy?

I've stumbled upon a doubt about using droplevels in my dataset.我偶然发现了对在我的数据集中使用 droplevels 的疑问。 I have 4 factors in my "Disease column".我的“疾病专栏”中有 4 个因素。

BD$Etiología <- factor(BD$Etiología, levels=c(0,1,2,3,4) ,
labels= c("Control","Idiop","LMNA","BAG3","Isquémica"), ordered=FALSE)

Then i make a subset in order to just compare the Control Cases vs 1 of the diseases.然后我制作一个子集,以便仅比较对照病例与 1 种疾病。

BD_C_ID <- subset(BD, Etiología=="Control" | Etiología=="Idiop")

BD_C_ID$Etiología= droplevels(BD_C_ID$Etiología) 

BD_C_ID$Etiología

[1] Control Control Control Control Control Control Control Idiop   Idiop   Control Control Control
[13] Control Idiop   Idiop   Idiop   Idiop   Idiop   Idiop   Idiop   Idiop   Idiop   Idiop   Idiop  
[25] Idiop   Idiop   Control Control Control Control Idiop   Control Control Control Control Control
[37] Idiop   Idiop   Idiop   Idiop  
Levels: Control Idiop

Since the first factor was unordered, and i just drop the levels i don't use.由于第一个因素是无序的,我只是降低了我不使用的水平。 Could i treat them as a 0-1 coded value in order to use them in a lm , or a logistic regression?我可以将它们视为 0-1 编码值以便在lm或逻辑回归中使用它们吗? Or will there be a problem?或者会不会有问题?

Also, does that apply if i use the Control VS BAG3 (0-3 in the initial code?)?另外,如果我使用 Control VS BAG3(初始代码中的 0-3?),这是否适用? Or will i need to re-level them so its 0-1 re-applying factors?还是我需要重新调整它们,使其 0-1 重新应用因子?

Short answer is it doesn't matter.简短的回答是没关系。 If you use them in a linear model lm or logistic regression, the model will use the first level as a reference level, so in this case, it is always "Control" .如果您在线性 model lm或逻辑回归中使用它们,则 model 将使用第一个级别作为参考级别,因此在这种情况下,它始终是"Control" The droplevels() is good if you need to perform some functions with the factors, but if it is purely for lm() or glm() , these functions takes care of the factors underneath.如果您需要使用因子执行某些功能,则droplevels()很好,但如果它纯粹用于lm()glm() ,则这些函数会处理下面的因子。

To illustrate this using your example:为了说明这一点,使用您的示例:

set.seed(111)
BD = data.frame(
          Etiologia = sample(0:4,100,replace=TRUE),
          x = rnorm(100),
          y = rnorm(100)
                )

We can just do:我们可以这样做:

BD$E <- factor(BD$Etiologia,levels=0:4,
labels= c("Control","Idiop","LMNA","BAG3","Isquemica"))

lm(y ~ x + E,data=subset(BD,E %in% c("Control","Idiop")))

Call:
lm(formula = y ~ x + E, data = subset(BD, E %in% c("Control", "Idiop")))

Coefficients:
(Intercept)            x       EIdiop  
   -0.05524      0.21596      0.30433 

And using another comparison:并使用另一个比较:

lm(y ~ x + E,data=subset(BD,E %in% c("Control","BAG3")))

     Call:
lm(formula = y ~ x + E, data = subset(BD, E %in% c("Control", 
    "BAG3")))

Coefficients:
(Intercept)            x        EBAG3  
   -0.03355      0.08978     -0.21708  

You get the same result if you do:如果你这样做,你会得到相同的结果:

BD$Etiologia <- factor(BD$Etiologia, levels=c(0,1,2,3,4) ,
labels= c("Control","Idiop","LMNA","BAG3","Isquemica"), ordered=FALSE)

BD_C_ID <- droplevels(subset(BD, Etiologia=="Control" | Etiologia=="Idiop"))

lm(y ~ x + Etiologia,data=BD_C_ID)

Call:
lm(formula = y ~ x + Etiologia, data = BD_C_ID)

Coefficients:
   (Intercept)               x  EtiologiaIdiop  
      -0.05524         0.21596         0.30433  

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM