简体   繁体   中英

How to fill a dataset with 0s and 1s for values that match in row-column, in R?

I've a dataset in a csv file that looks as follows:

 X               Colour Orange Red White Violet Black Yellow Blue
1 1          Orange, Red     NA  NA    NA     NA    NA     NA   NA
2 2                  Red     NA  NA    NA     NA    NA     NA   NA
3 3         White, Black     NA  NA    NA     NA    NA     NA   NA
4 4               Yellow     NA  NA    NA     NA    NA     NA   NA
5 5 Blue, Orange, Violet     NA  NA    NA     NA    NA     NA   NA

I'm trying to add 0s and 1s for every row-column match that occurs. The expected out put is:

      Colour     Orange Red White   Violet  Black   Yellow  Blue
1   Orange,Red   1       1    0        0      0        0      0
2   Red          0       1    0        0      0        0      0
3   White,Black  0       0    1        0      1        0      0
4   Yellow       0       0    0        0      0        1      0
5   Blue,Orange, 1       0    0        1      0        0      1
    Violet

How to achieve this in R?

Loop across the column names, and check if they're in the pattern using grepl :

dat[-(1:2)] <-  sapply( colnames(dat[-(1:2)]), grepl, x=dat$Colour  ) + 0

#  X               Colour Orange Red White Violet Black Yellow Blue
#1 1          Orange, Red      1   1     0      0     0      0    0
#2 2                  Red      0   1     0      0     0      0    0
#3 3         White, Black      0   0     1      0     1      0    0
#4 4               Yellow      0   0     0      0     0      1    0
#5 5 Blue, Orange, Violet      1   0     0      1     0      0    1

Not sure whether you added the NA columns or not. Even without having any identifier NA columns, we can use strsplit to split the "Colour" column, apply mtabulate on the list output and if needed, rearrange the output based on the column names of 'dat'

library(qdapTools)
cbind(dat[1:2], mtabulate(strsplit(dat$Colour, ', ')))[names(dat)]
#   X               Colour Orange Red White Violet Black Yellow Blue
#1 1          Orange, Red      1   1     0      0     0      0    0
#2 2                  Red      0   1     0      0     0      0    0
#3 3         White, Black      0   0     1      0     1      0    0
#4 4               Yellow      0   0     0      0     0      1    0
#5 5 Blue, Orange, Violet      1   0     0      1     0      0    1

or a similar approach would be to use cSplit_e from splitstackshape

library(splitstackshape)
cSplit_e(dat[1:2], 'Colour', type='character', fill=0)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM