简体   繁体   中英

R: Finding multiple string matches in a vector of strings

I have the following list of file names:

files.list <- c("Fasted DWeib NoCmaxW.xlsx", "Fed DWeib NoCmaxW.xlsx", "Fasted SWeib NoCmaxW.xlsx", "Fed SWeib NoCmaxW.xlsx", "Fasted DWeib Cmax10.xlsx", "Fed DWeib Cmax10.xlsx", "Fasted SWeib Cmax10.xlsx", "Fed SWeib Cmax10.xlsx")

I want to identify which files have the following sub-strings:

toMatch <- c("Fasted", "DWeib NoCmaxW")

The examples I have found often quote the following usage:

grep(paste(toMatch, collapse = "|"), files.list, value=TRUE)

However, this returns four possibilities:

[1] "Fasted DWeib NoCmaxW.xlsx" "Fed DWeib NoCmaxW.xlsx"    "Fasted SWeib NoCmaxW.xlsx"
[4] "Fasted DWeib Cmax10.xlsx"  "Fasted SWeib Cmax10.xlsx" 

I want the filename which contains both elements of toMatch (ie "Fasted" and "DWeib NoCmaxW"). There is only one file which satisfies that requirement (files.list[1]). I assumed the "|" in the paste command might be a logical OR, and so I tried "&", but that didn't address my problem.

Can someone please help?

Thank you.

We can use &

i1 <- grepl(toMatch[1], files.list) & grepl(toMatch[2], files.list)

If there are multiple elements in 'toMatch', loop through them with lapply and Reduce to a single logical vector with &

i1 <- Reduce(`&`, lapply(toMatch, grepl, x = files.list))
files.list[i1]
#[1] "Fasted DWeib NoCmaxW.xlsx"

It is also possible to collapse the elements with .* ie to match first word of 'toMatch' followed by a word boundary( \\\\b ) then some characters ( .* ) and another word boundary ( \\\\b ) before the second word of 'toMatch'. In this example it works. May be it is better to add the word boundary at the start and end as well (which is not needed for this example)

pat1 <- paste(toMatch, collapse= "\\b.*\\b")
grep(pat1, files.list, value = TRUE)
#[1] "Fasted DWeib NoCmaxW.xlsx"

But, this will look for matches in the same order of words in 'toMatch'. In case, if have substring in reverse order and want to match those as well, create the pattern in the reverse order and then collapse with |

pat2 <- paste(rev(toMatch), collapse="\\b.*\\b")
pat <- paste(pat1, pat2, sep="|")
grep(pat, files.list, value = TRUE) 
#[1] "Fasted DWeib NoCmaxW.xlsx"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM