简体   繁体   English

字符串列表的子集数据

[英]Subset data from a list of Strings

This is the extension of my previous problem . 这是我先前问题的扩展。

I have changed the dataset slightly. 我已经稍微改变了数据集。

dat2 <- read.table(header=TRUE, text="
ID  De  Ep  Ti  ID1
A1123    A117 121 100 11231
                   A1123Zmcd A108 C207 D110 E11232
                   A1124MDN A122cd C207 D110 E11232
                   A1124MDN A117mnm C207cd D110 E11232
                   A1124fex A122cdc C208 D110 E11232
                   B1125MD A108vbd C208 D110 E11232
                   B1125klmc A10cde8 C208 D110 E11232
                   B1126true A122 C208 D110 E11233
                   C1126mlk A109bvc C208cd D111 E11233
                   ")
dat2
         ID      De     Ep   Ti    ID1
1     A1123    A117    121  100  11231
2 A1123Zmcd    A108   C207 D110 E11232
3  A1124MDN  A122cd   C207 D110 E11232
4  A1124MDN A117mnm C207cd D110 E11232
5  A1124fex A122cdc   C208 D110 E11232
6   B1125MD A108vbd   C208 D110 E11232
7 B1125klmc A10cde8   C208 D110 E11232
8 B1126true    A122   C208 D110 E11233
9  C1126mlk A109bvc C208cd D111 E11233

My matching string file is below: 我的匹配字符串文件如下:

dat3 <- read.table(header=TRUE, text="
ID1  
mc    
                   MDN 
                   md 
                   true 
                   cd
                   vb
                   MAKE
                   MODEL
                   ")
dat3
    ID1
1    mc
2   MDN
3    md
4  true
5    cd
6    vb
7  MAKE
8 MODEL

From the previous solution by using grep I can get the files by a particular string. 通过使用grep的先前解决方案,我可以通过特定字符串获取文件。

dat2[grep(dat3[1, ],dat2$ID),]
         ID      De   Ep   Ti    ID1
2 A1123Zmcd    A108 C207 D110 E11232
7 B1125klmc A10cde8 C208 D110 E11232

I want to get the dataframe matching all strings in dat3 present in any of the first three columns in dat2 like the following. 我想要得到的数据帧匹配的所有字符串dat3存在于任何前三列的dat2像下面这样。

         ID      De   Ep   Ti    ID1
A1123Zmcd    A108 C207 D110 E11232
A1124MDN  A122cd C207 D110 E11232
A1124MDN A117mnm C207cd D110 E11232
A1124fex A122cdc C208 D110 E11232
B1125MD A108vbd C208 D110 E11232
B1125klmc A10cde8 C208 D110 E11232
B1126true    A122 C208 D110 E11233
C1126mlk A109bvc C208cd D111 E11233

We can loop through the columns of 'dat2' using lapply , collapse the 'dat3' column into single string delimited by | 我们可以使用lapply “ dat2”列,将“ dat3”列折叠为由|分隔的单个字符串| , use that as pattern in grepl , the resulting list of logical vectors can be collapsed into a single vector with Reduce , and use that as row index to subset the 'dat2'. ,使用,作为patterngrepl ,所得到的list的逻辑载体可以折叠成与单个载体Reduce ,并用其作为行索引子集“DAT2”。

dat2[Reduce(`|`,lapply(dat2, function(x) grepl(paste(dat3[[1]], collapse='|'), x))),]
#         ID      De     Ep   Ti    ID1
#2 A1123Zmcd    A108   C207 D110 E11232
#3  A1124MDN  A122cd   C207 D110 E11232
#4  A1124MDN A117mnm C207cd D110 E11232
#5  A1124fex A122cdc   C208 D110 E11232
#6   B1125MD A108vbd   C208 D110 E11232
#7 B1125klmc A10cde8   C208 D110 E11232
#8 B1126true    A122   C208 D110 E11233
#9  C1126mlk A109bvc C208cd D111 E11233

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM