I've a dataset in a csv file that looks as follows:
X Colour Orange Red White Violet Black Yellow Blue
1 1 Orange, Red NA NA NA NA NA NA NA
2 2 Red NA NA NA NA NA NA NA
3 3 White, Black NA NA NA NA NA NA NA
4 4 Yellow NA NA NA NA NA NA NA
5 5 Blue, Orange, Violet NA NA NA NA NA NA NA
I'm trying to add 0s and 1s for every row-column match that occurs. The expected out put is:
Colour Orange Red White Violet Black Yellow Blue
1 Orange,Red 1 1 0 0 0 0 0
2 Red 0 1 0 0 0 0 0
3 White,Black 0 0 1 0 1 0 0
4 Yellow 0 0 0 0 0 1 0
5 Blue,Orange, 1 0 0 1 0 0 1
Violet
How to achieve this in R?
Loop across the column names, and check if they're in the pattern using grepl
:
dat[-(1:2)] <- sapply( colnames(dat[-(1:2)]), grepl, x=dat$Colour ) + 0
# X Colour Orange Red White Violet Black Yellow Blue
#1 1 Orange, Red 1 1 0 0 0 0 0
#2 2 Red 0 1 0 0 0 0 0
#3 3 White, Black 0 0 1 0 1 0 0
#4 4 Yellow 0 0 0 0 0 1 0
#5 5 Blue, Orange, Violet 1 0 0 1 0 0 1
Not sure whether you added the NA columns or not. Even without having any identifier NA columns, we can use strsplit
to split the "Colour" column, apply mtabulate
on the list output and if needed, rearrange the output based on the column names of 'dat'
library(qdapTools)
cbind(dat[1:2], mtabulate(strsplit(dat$Colour, ', ')))[names(dat)]
# X Colour Orange Red White Violet Black Yellow Blue
#1 1 Orange, Red 1 1 0 0 0 0 0
#2 2 Red 0 1 0 0 0 0 0
#3 3 White, Black 0 0 1 0 1 0 0
#4 4 Yellow 0 0 0 0 0 1 0
#5 5 Blue, Orange, Violet 1 0 0 1 0 0 1
or a similar approach would be to use cSplit_e
from splitstackshape
library(splitstackshape)
cSplit_e(dat[1:2], 'Colour', type='character', fill=0)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.