简体   繁体   English

如何在R中用行和列匹配的值填充0和1的数据集?

[英]How to fill a dataset with 0s and 1s for values that match in row-column, in R?

I've a dataset in a csv file that looks as follows: 我在csv文件中有一个数据集,如下所示:

 X               Colour Orange Red White Violet Black Yellow Blue
1 1          Orange, Red     NA  NA    NA     NA    NA     NA   NA
2 2                  Red     NA  NA    NA     NA    NA     NA   NA
3 3         White, Black     NA  NA    NA     NA    NA     NA   NA
4 4               Yellow     NA  NA    NA     NA    NA     NA   NA
5 5 Blue, Orange, Violet     NA  NA    NA     NA    NA     NA   NA

I'm trying to add 0s and 1s for every row-column match that occurs. 我正在尝试为每个发生的行列匹配添加0和1。 The expected out put is: 预计的出局是:

      Colour     Orange Red White   Violet  Black   Yellow  Blue
1   Orange,Red   1       1    0        0      0        0      0
2   Red          0       1    0        0      0        0      0
3   White,Black  0       0    1        0      1        0      0
4   Yellow       0       0    0        0      0        1      0
5   Blue,Orange, 1       0    0        1      0        0      1
    Violet

How to achieve this in R? 如何在R中实现这一目标?

Loop across the column names, and check if they're in the pattern using grepl : grepl列名称,并使用grepl检查它们是否在模式中:

dat[-(1:2)] <-  sapply( colnames(dat[-(1:2)]), grepl, x=dat$Colour  ) + 0

#  X               Colour Orange Red White Violet Black Yellow Blue
#1 1          Orange, Red      1   1     0      0     0      0    0
#2 2                  Red      0   1     0      0     0      0    0
#3 3         White, Black      0   0     1      0     1      0    0
#4 4               Yellow      0   0     0      0     0      1    0
#5 5 Blue, Orange, Violet      1   0     0      1     0      0    1

Not sure whether you added the NA columns or not. 不确定是否添加了NA列。 Even without having any identifier NA columns, we can use strsplit to split the "Colour" column, apply mtabulate on the list output and if needed, rearrange the output based on the column names of 'dat' 即使没有任何标识符NA列,我们也可以使用strsplit拆分“颜色”列,在列表输出上应用mtabulate ,如果需要,根据'dat'的列名重新排列输出

library(qdapTools)
cbind(dat[1:2], mtabulate(strsplit(dat$Colour, ', ')))[names(dat)]
#   X               Colour Orange Red White Violet Black Yellow Blue
#1 1          Orange, Red      1   1     0      0     0      0    0
#2 2                  Red      0   1     0      0     0      0    0
#3 3         White, Black      0   0     1      0     1      0    0
#4 4               Yellow      0   0     0      0     0      1    0
#5 5 Blue, Orange, Violet      1   0     0      1     0      0    1

or a similar approach would be to use cSplit_e from splitstackshape 或者类似的方法是使用cSplit_esplitstackshape

library(splitstackshape)
cSplit_e(dat[1:2], 'Colour', type='character', fill=0)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM