简体   繁体   中英

model.matrix dropping a column

I have a data frame, that I am wanting to use to generate a design matrix.

>ct<-read.delim(filename, skip=0, as.is=TRUE, sep="\t", row.names = 1)
> ct
      s2 s6 S10 S14 S3 S7 S11 S15 S4 S8 S12 S16
group  1  1   1   1  2  2   2   2  3  3   3   3
donor  1  2   3   4  1  2   3   4  1  2   3   4
>factotum<-apply(ct,1,as.factor) # to turn rows into factors. 
>design <- model.matrix(~0 + factotum[,1] + factotum[,2])

Eventually, I'll generate a string and use as.formula() instead of hard coding the formula. Anyway, this works and produces a design matrix. It leaves a column out though.

>design
   factotum[, 1]1 factotum[, 1]2 factotum[, 1]3 factotum[, 2]2 factotum[, 2]3 factotum[, 2]4
1               1              0              0              0              0              0
2               1              0              0              1              0              0
3               1              0              0              0              1              0
4               1              0              0              0              0              1
5               0              1              0              0              0              0
6               0              1              0              1              0              0
7               0              1              0              0              1              0
8               0              1              0              0              0              1
9               0              0              1              0              0              0
10              0              0              1              1              0              0
11              0              0              1              0              1              0
12              0              0              1              0              0              1

By my reasoning, the column names should be: factotum[, 1]1 factotum[, 1]2 factotum[, 1]3, factotum[,2]1, factotum[, 2]2 factotum[, 2]3 factotum[, 2]4. These would be renamed as group1,group2,group3,donor1,donor2,donor3,donor4.

Which means that factotum[,2]1, or donor1, is missing. What am I doing that this would be missing? Any help would be be appreciated.

Cheers Ben.

There are several things here.

(1) apply(ct,1,as.factor) doesn't necessarily turn the rows into factors. Try str(factotum) and you'll see that it failed. I'm not sure what the fastest way is, but this should work:

factotum <- data.frame(lapply(data.frame(t(ct)), as.factor))

(2) Since you are working with factors, model.matrix creates dummy coding. In this case, donor has four values. If you are 2 , then you get a 1 in the column factotum[,2]2 . If you are 3 or 4 , you get a 1 in their respective columns. So what if you are a 1 ? Well, that simply means that you are 0 in all three columns. In this way, you only need three columns to create four groups. The value 1 for donor is called the reference group here, which is the group with which the other groups are compared.

(3) So now the question is... Why doesn't group (or factotum[,1] ) have only TWO columns? We could easily code three levels with two columns, right? Well... yes, this is exactly what happens when you use:

design <- model.matrix(~ factotum[,1] + factotum[,2])

However, since you specify that there is no intercept, you'll get an extra column for group .

(4) Usually you don't have to create the design matrix yourself. I'm not sure what function you want to use next, but in most cases the functions take care of it for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM