[英]How to fill a dataset with 0s and 1s for values that match in row-column, in R?
I've a dataset in a csv file that looks as follows: 我在csv文件中有一个数据集,如下所示:
X Colour Orange Red White Violet Black Yellow Blue
1 1 Orange, Red NA NA NA NA NA NA NA
2 2 Red NA NA NA NA NA NA NA
3 3 White, Black NA NA NA NA NA NA NA
4 4 Yellow NA NA NA NA NA NA NA
5 5 Blue, Orange, Violet NA NA NA NA NA NA NA
I'm trying to add 0s and 1s for every row-column match that occurs. 我正在尝试为每个发生的行列匹配添加0和1。 The expected out put is: 预计的出局是:
Colour Orange Red White Violet Black Yellow Blue
1 Orange,Red 1 1 0 0 0 0 0
2 Red 0 1 0 0 0 0 0
3 White,Black 0 0 1 0 1 0 0
4 Yellow 0 0 0 0 0 1 0
5 Blue,Orange, 1 0 0 1 0 0 1
Violet
How to achieve this in R? 如何在R中实现这一目标?
Loop across the column names, and check if they're in the pattern using grepl
: grepl
列名称,并使用grepl
检查它们是否在模式中:
dat[-(1:2)] <- sapply( colnames(dat[-(1:2)]), grepl, x=dat$Colour ) + 0
# X Colour Orange Red White Violet Black Yellow Blue
#1 1 Orange, Red 1 1 0 0 0 0 0
#2 2 Red 0 1 0 0 0 0 0
#3 3 White, Black 0 0 1 0 1 0 0
#4 4 Yellow 0 0 0 0 0 1 0
#5 5 Blue, Orange, Violet 1 0 0 1 0 0 1
Not sure whether you added the NA columns or not. 不确定是否添加了NA列。 Even without having any identifier NA columns, we can use strsplit
to split the "Colour" column, apply mtabulate
on the list output and if needed, rearrange the output based on the column names of 'dat' 即使没有任何标识符NA列,我们也可以使用strsplit
拆分“颜色”列,在列表输出上应用mtabulate
,如果需要,根据'dat'的列名重新排列输出
library(qdapTools)
cbind(dat[1:2], mtabulate(strsplit(dat$Colour, ', ')))[names(dat)]
# X Colour Orange Red White Violet Black Yellow Blue
#1 1 Orange, Red 1 1 0 0 0 0 0
#2 2 Red 0 1 0 0 0 0 0
#3 3 White, Black 0 0 1 0 1 0 0
#4 4 Yellow 0 0 0 0 0 1 0
#5 5 Blue, Orange, Violet 1 0 0 1 0 0 1
or a similar approach would be to use cSplit_e
from splitstackshape
或者类似的方法是使用cSplit_e
的splitstackshape
library(splitstackshape)
cSplit_e(dat[1:2], 'Colour', type='character', fill=0)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.