简体   繁体   English

model.matrix删除列

[英]model.matrix dropping a column

I have a data frame, that I am wanting to use to generate a design matrix. 我有一个数据框,希望用于生成设计矩阵。

>ct<-read.delim(filename, skip=0, as.is=TRUE, sep="\t", row.names = 1)
> ct
      s2 s6 S10 S14 S3 S7 S11 S15 S4 S8 S12 S16
group  1  1   1   1  2  2   2   2  3  3   3   3
donor  1  2   3   4  1  2   3   4  1  2   3   4
>factotum<-apply(ct,1,as.factor) # to turn rows into factors. 
>design <- model.matrix(~0 + factotum[,1] + factotum[,2])

Eventually, I'll generate a string and use as.formula() instead of hard coding the formula. 最终,我将生成一个字符串并使用as.formula()而不是对公式进行硬编码。 Anyway, this works and produces a design matrix. 无论如何,这有效并产生了一个设计矩阵。 It leaves a column out though. 虽然它留下一列。

>design
   factotum[, 1]1 factotum[, 1]2 factotum[, 1]3 factotum[, 2]2 factotum[, 2]3 factotum[, 2]4
1               1              0              0              0              0              0
2               1              0              0              1              0              0
3               1              0              0              0              1              0
4               1              0              0              0              0              1
5               0              1              0              0              0              0
6               0              1              0              1              0              0
7               0              1              0              0              1              0
8               0              1              0              0              0              1
9               0              0              1              0              0              0
10              0              0              1              1              0              0
11              0              0              1              0              1              0
12              0              0              1              0              0              1

By my reasoning, the column names should be: factotum[, 1]1 factotum[, 1]2 factotum[, 1]3, factotum[,2]1, factotum[, 2]2 factotum[, 2]3 factotum[, 2]4. 以我的推理,列名称应为:factotum [,1] 1 factotum [,1] 2 factotum [,1] 3,factotum [,2] 1,factotum [,2] 2 factotum [,2] 3 factotum [ ,2] 4。 These would be renamed as group1,group2,group3,donor1,donor2,donor3,donor4. 这些将被重命名为group1,group2,group3,donor1,donor2,donor3,donor4。

Which means that factotum[,2]1, or donor1, is missing. 这意味着factotum [,2] 1或捐助者1丢失了。 What am I doing that this would be missing? 我在做什么,这将丢失? Any help would be be appreciated. 任何帮助,将不胜感激。

Cheers Ben. 干杯本。

There are several things here. 这里有几件事。

(1) apply(ct,1,as.factor) doesn't necessarily turn the rows into factors. (1) apply(ct,1,as.factor)不一定会将行变成因子。 Try str(factotum) and you'll see that it failed. 尝试使用str(factotum) ,您会发现它失败了。 I'm not sure what the fastest way is, but this should work: 我不确定最快的方法是什么,但这应该可行:

factotum <- data.frame(lapply(data.frame(t(ct)), as.factor))

(2) Since you are working with factors, model.matrix creates dummy coding. (2)由于您正在使用因子,因此model.matrix会创建伪编码。 In this case, donor has four values. 在这种情况下, donor具有四个值。 If you are 2 , then you get a 1 in the column factotum[,2]2 . 如果您是2 ,则在factotum[,2]2factotum[,2]2得到1 If you are 3 or 4 , you get a 1 in their respective columns. 如果您是34 ,则在其各自的列中得到1 So what if you are a 1 ? 那么如果您是1怎么办? Well, that simply means that you are 0 in all three columns. 好吧,这仅表示您在所有三列中均为0 In this way, you only need three columns to create four groups. 这样,您只需要三个列即可创建四个组。 The value 1 for donor is called the reference group here, which is the group with which the other groups are compared. donor的值1在这里称为参考组,这是与其他组进行比较的组。

(3) So now the question is... Why doesn't group (or factotum[,1] ) have only TWO columns? (3)所以现在的问题是...为什么group (或factotum[,1] )只有两个列? We could easily code three levels with two columns, right? 我们可以轻松地用两列编码三个级别,对吗? Well... yes, this is exactly what happens when you use: 好吧,是的,这正是您使用时发生的情况:

design <- model.matrix(~ factotum[,1] + factotum[,2])

However, since you specify that there is no intercept, you'll get an extra column for group . 但是,由于您指定没有拦截,因此您将为group获得一个额外的列。

(4) Usually you don't have to create the design matrix yourself. (4)通常,您不必自己创建设计矩阵。 I'm not sure what function you want to use next, but in most cases the functions take care of it for you. 我不确定接下来要使用什么功能,但是在大多数情况下,这些功能将为您解决。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM