[英]Subset data from a list of Strings
This is the extension of my previous problem . 这是我先前问题的扩展。
I have changed the dataset slightly. 我已经稍微改变了数据集。
dat2 <- read.table(header=TRUE, text="
ID De Ep Ti ID1
A1123 A117 121 100 11231
A1123Zmcd A108 C207 D110 E11232
A1124MDN A122cd C207 D110 E11232
A1124MDN A117mnm C207cd D110 E11232
A1124fex A122cdc C208 D110 E11232
B1125MD A108vbd C208 D110 E11232
B1125klmc A10cde8 C208 D110 E11232
B1126true A122 C208 D110 E11233
C1126mlk A109bvc C208cd D111 E11233
")
dat2
ID De Ep Ti ID1
1 A1123 A117 121 100 11231
2 A1123Zmcd A108 C207 D110 E11232
3 A1124MDN A122cd C207 D110 E11232
4 A1124MDN A117mnm C207cd D110 E11232
5 A1124fex A122cdc C208 D110 E11232
6 B1125MD A108vbd C208 D110 E11232
7 B1125klmc A10cde8 C208 D110 E11232
8 B1126true A122 C208 D110 E11233
9 C1126mlk A109bvc C208cd D111 E11233
My matching string file is below: 我的匹配字符串文件如下:
dat3 <- read.table(header=TRUE, text="
ID1
mc
MDN
md
true
cd
vb
MAKE
MODEL
")
dat3
ID1
1 mc
2 MDN
3 md
4 true
5 cd
6 vb
7 MAKE
8 MODEL
From the previous solution by using grep
I can get the files by a particular string. 通过使用
grep
的先前解决方案,我可以通过特定字符串获取文件。
dat2[grep(dat3[1, ],dat2$ID),]
ID De Ep Ti ID1
2 A1123Zmcd A108 C207 D110 E11232
7 B1125klmc A10cde8 C208 D110 E11232
I want to get the dataframe matching all strings in dat3
present in any of the first three columns in dat2
like the following. 我想要得到的数据帧匹配的所有字符串
dat3
存在于任何前三列的dat2
像下面这样。
ID De Ep Ti ID1
A1123Zmcd A108 C207 D110 E11232
A1124MDN A122cd C207 D110 E11232
A1124MDN A117mnm C207cd D110 E11232
A1124fex A122cdc C208 D110 E11232
B1125MD A108vbd C208 D110 E11232
B1125klmc A10cde8 C208 D110 E11232
B1126true A122 C208 D110 E11233
C1126mlk A109bvc C208cd D111 E11233
We can loop through the columns of 'dat2' using lapply
, collapse the 'dat3' column into single string delimited by |
我们可以使用
lapply
“ dat2”列,将“ dat3”列折叠为由|
分隔的单个字符串|
, use that as pattern
in grepl
, the resulting list
of logical vectors can be collapsed into a single vector with Reduce
, and use that as row index to subset the 'dat2'. ,使用,作为
pattern
在grepl
,所得到的list
的逻辑载体可以折叠成与单个载体Reduce
,并用其作为行索引子集“DAT2”。
dat2[Reduce(`|`,lapply(dat2, function(x) grepl(paste(dat3[[1]], collapse='|'), x))),]
# ID De Ep Ti ID1
#2 A1123Zmcd A108 C207 D110 E11232
#3 A1124MDN A122cd C207 D110 E11232
#4 A1124MDN A117mnm C207cd D110 E11232
#5 A1124fex A122cdc C208 D110 E11232
#6 B1125MD A108vbd C208 D110 E11232
#7 B1125klmc A10cde8 C208 D110 E11232
#8 B1126true A122 C208 D110 E11233
#9 C1126mlk A109bvc C208cd D111 E11233
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.