简体   繁体   中英

Subset data from a list of Strings

This is the extension of my previous problem .

I have changed the dataset slightly.

dat2 <- read.table(header=TRUE, text="
ID  De  Ep  Ti  ID1
A1123    A117 121 100 11231
                   A1123Zmcd A108 C207 D110 E11232
                   A1124MDN A122cd C207 D110 E11232
                   A1124MDN A117mnm C207cd D110 E11232
                   A1124fex A122cdc C208 D110 E11232
                   B1125MD A108vbd C208 D110 E11232
                   B1125klmc A10cde8 C208 D110 E11232
                   B1126true A122 C208 D110 E11233
                   C1126mlk A109bvc C208cd D111 E11233
                   ")
dat2
         ID      De     Ep   Ti    ID1
1     A1123    A117    121  100  11231
2 A1123Zmcd    A108   C207 D110 E11232
3  A1124MDN  A122cd   C207 D110 E11232
4  A1124MDN A117mnm C207cd D110 E11232
5  A1124fex A122cdc   C208 D110 E11232
6   B1125MD A108vbd   C208 D110 E11232
7 B1125klmc A10cde8   C208 D110 E11232
8 B1126true    A122   C208 D110 E11233
9  C1126mlk A109bvc C208cd D111 E11233

My matching string file is below:

dat3 <- read.table(header=TRUE, text="
ID1  
mc    
                   MDN 
                   md 
                   true 
                   cd
                   vb
                   MAKE
                   MODEL
                   ")
dat3
    ID1
1    mc
2   MDN
3    md
4  true
5    cd
6    vb
7  MAKE
8 MODEL

From the previous solution by using grep I can get the files by a particular string.

dat2[grep(dat3[1, ],dat2$ID),]
         ID      De   Ep   Ti    ID1
2 A1123Zmcd    A108 C207 D110 E11232
7 B1125klmc A10cde8 C208 D110 E11232

I want to get the dataframe matching all strings in dat3 present in any of the first three columns in dat2 like the following.

         ID      De   Ep   Ti    ID1
A1123Zmcd    A108 C207 D110 E11232
A1124MDN  A122cd C207 D110 E11232
A1124MDN A117mnm C207cd D110 E11232
A1124fex A122cdc C208 D110 E11232
B1125MD A108vbd C208 D110 E11232
B1125klmc A10cde8 C208 D110 E11232
B1126true    A122 C208 D110 E11233
C1126mlk A109bvc C208cd D111 E11233

We can loop through the columns of 'dat2' using lapply , collapse the 'dat3' column into single string delimited by | , use that as pattern in grepl , the resulting list of logical vectors can be collapsed into a single vector with Reduce , and use that as row index to subset the 'dat2'.

dat2[Reduce(`|`,lapply(dat2, function(x) grepl(paste(dat3[[1]], collapse='|'), x))),]
#         ID      De     Ep   Ti    ID1
#2 A1123Zmcd    A108   C207 D110 E11232
#3  A1124MDN  A122cd   C207 D110 E11232
#4  A1124MDN A117mnm C207cd D110 E11232
#5  A1124fex A122cdc   C208 D110 E11232
#6   B1125MD A108vbd   C208 D110 E11232
#7 B1125klmc A10cde8   C208 D110 E11232
#8 B1126true    A122   C208 D110 E11233
#9  C1126mlk A109bvc C208cd D111 E11233

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM