简体   繁体   English

使用R从数据帧创建0和1的矩阵

[英]Create a matrix of 0's and 1's from a data frame using R

How can I create a matrix of 0's and 1's from a data set with three columns labelled as hosp (ie hospital), pid (ie patient id) and treatment, as shown below 我如何从一个包含三列分别标记为医院(即医院),PID(即患者ID)和治疗的列的数据集中创建0和1的矩阵,如下所示

df<-
structure(list(
hosp=c(1L,1L,1L,1L,1L,1L,2L,2L,2L),
pid=c(1L,1L,1L,2L,3L,3L,4L,5L,5L),
treatment=c(0L,0L,0L,1L,1L,1L,0L,1L,1L)
),
.Names=c("hosp","pid","treatment"),
class="data.frame",row.names=c(NA,-9))

The rows and columns of the matrix should be the number of observations (in this case 9) and the unique number of hospitals, respectively. 矩阵的行和列应分别为观察数(在本例中为9)和唯一的医院数。 The entries in the matrix should be the treatment values, that is, it is 1 for a given hospital if the corresponding patient received treatment 1 in that hospital and 0 otherwise. 矩阵中的条目应为治疗值,即,如果相应的患者在该医院接受了1次治疗,则该医院为1;否则为0。 The matrix should look like 矩阵看起来应该像

matrix(c(0,0,
0,0,
0,0,
1,0,
1,0,
1,0,
0,0,
0,1,
0,1),nrow=9,byrow=TRUE)

Any help would be much appreciated, thanks. 任何帮助将不胜感激,谢谢。

1) Create a model matrix from hosp as a factor with no intercept term and multiply that by treatment : 1)hosp创建一个模型矩阵作为没有截距项的因子,并将其乘以treatment

hosp <- factor(df$hosp)
model.matrix(~ hosp + 0) * df$treatment

giving: 赠送:

  hosp1 hosp2
1     0     0
2     0     0
3     0     0
4     1     0
5     1     0
6     1     0
7     0     0
8     0     1
9     0     1
attr(,"assign")
[1] 1 1
attr(,"contrasts")
attr(,"contrasts")$hosp
[1] "contr.treatment"

2) outer(hosp, unique(hosp), "==") is the model matrix of hosp except using TRUE/FALSE in place of 1/0. 2) outer(hosp, unique(hosp), "==")hosp的模型矩阵,只是使用TRUE / FALSE代替1/0。 Multiply that by treatment . 通过treatment乘以。

with(df, outer(hosp, unique(hosp), "==") * treatment)

giving

      [,1] [,2]
 [1,]    0    0
 [2,]    0    0
 [3,]    0    0
 [4,]    1    0
 [5,]    1    0
 [6,]    1    0
 [7,]    0    0
 [8,]    0    1
 [9,]    0    1

Update: Added (1) and simplified (2). 更新:增加了(1),并简化了(2)。

Here's my workaround for this. 这是我的解决方法。 Not the cleanest, but it works! 不是最干净的,但是可以!

    require(dplyr)

df2 <- df %>% 
  mutate(x = row_number()) %>% 
  select(-pid) %>% 
  spread(x, treatment)

df3 <- df2 %>% 
  gather("keys", "value", 2:10) %>% 
  spread(hosp, value) %>% 
  select(-keys)

df3[is.na(df3)] <- 0
df3 <- as.matrix(df3)

Step by Step: 一步步:

Take original df and add a row_number to it so we can spread without duplication. 取原始df并向其添加row_number,以便我们可以不进行重复地spread We'll also remove pid since you're changing this to a matrix. 我们还将删除pid因为您将其更改为矩阵。

    require(dplyr)

df2 <- df %>% 
  mutate(x = row_number()) %>% 
  select(-pid) %>% 
  spread(x, treatment)

Then we want to change it back to long form: 然后,我们想将其改回长格式:

df3 <- df2 %>% 
  gather("keys", "value", 2:10) %>% 
  spread(hosp, value) %>% 
  select(-keys)

Some of the values are still NA , so we convert them into 0 s, and then turn it into a matrix using `` 有些值仍然是NA ,因此我们将它们转换为0 s,然后使用``

df3[is.na(df3)] <- 0
df3 <- as.matrix(df3)

  1 2
1 0 0
2 0 0
3 0 0
4 1 0
5 1 0
6 1 0
7 0 0
8 0 1
9 0 1

how about: 怎么样:

> sapply(unique(df$hosp),function(x) ifelse(df$hosp==x&df$treatment==1,1,0))
      [,1] [,2]
 [1,]    0    0
 [2,]    0    0
 [3,]    0    0
 [4,]    1    0
 [5,]    1    0
 [6,]    1    0
 [7,]    0    0
 [8,]    0    1
 [9,]    0    1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM