简体   繁体   English

R:在字符串向量中查找多个字符串匹配项

[英]R: Finding multiple string matches in a vector of strings

I have the following list of file names:我有以下文件名列表:

files.list <- c("Fasted DWeib NoCmaxW.xlsx", "Fed DWeib NoCmaxW.xlsx", "Fasted SWeib NoCmaxW.xlsx", "Fed SWeib NoCmaxW.xlsx", "Fasted DWeib Cmax10.xlsx", "Fed DWeib Cmax10.xlsx", "Fasted SWeib Cmax10.xlsx", "Fed SWeib Cmax10.xlsx")

I want to identify which files have the following sub-strings:我想确定哪些文件具有以下子字符串:

toMatch <- c("Fasted", "DWeib NoCmaxW")

The examples I have found often quote the following usage:我发现的例子经常引用以下用法:

grep(paste(toMatch, collapse = "|"), files.list, value=TRUE)

However, this returns four possibilities:但是,这会返回四种可能性:

[1] "Fasted DWeib NoCmaxW.xlsx" "Fed DWeib NoCmaxW.xlsx"    "Fasted SWeib NoCmaxW.xlsx"
[4] "Fasted DWeib Cmax10.xlsx"  "Fasted SWeib Cmax10.xlsx" 

I want the filename which contains both elements of toMatch (ie "Fasted" and "DWeib NoCmaxW").我想要包含 toMatch 两个元素的文件名(即“Fasted”和“DWeib NoCmaxW”)。 There is only one file which satisfies that requirement (files.list[1]).只有一个文件满足该要求 (files.list[1])。 I assumed the "|"我假设“|” in the paste command might be a logical OR, and so I tried "&", but that didn't address my problem.在粘贴命令中可能是逻辑 OR,所以我尝试了“&”,但这并没有解决我的问题。

Can someone please help?有人可以帮忙吗?

Thank you.谢谢你。

We can use &我们可以使用&

i1 <- grepl(toMatch[1], files.list) & grepl(toMatch[2], files.list)

If there are multiple elements in 'toMatch', loop through them with lapply and Reduce to a single logical vector with &如果在“toMatch”的多个元件,通过环将它们与lapplyReduce到一个单一的逻辑vector&

i1 <- Reduce(`&`, lapply(toMatch, grepl, x = files.list))
files.list[i1]
#[1] "Fasted DWeib NoCmaxW.xlsx"

It is also possible to collapse the elements with .* ie to match first word of 'toMatch' followed by a word boundary( \\\\b ) then some characters ( .* ) and another word boundary ( \\\\b ) before the second word of 'toMatch'.也可以使用.*折叠元素,即匹配 'toMatch' 的第一个单词,然后是单词边界( \\\\b ),然后是一些字符( .* )和第二个单词之前的另一个单词边界( \\\\b ) 'toMatch'。 In this example it works.在这个例子中它有效。 May be it is better to add the word boundary at the start and end as well (which is not needed for this example)可能最好在开头和结尾添加单词边界(本示例不需要)

pat1 <- paste(toMatch, collapse= "\\b.*\\b")
grep(pat1, files.list, value = TRUE)
#[1] "Fasted DWeib NoCmaxW.xlsx"

But, this will look for matches in the same order of words in 'toMatch'.但是,这将在 'toMatch' 中以相同的单词顺序查找匹配项。 In case, if have substring in reverse order and want to match those as well, create the pattern in the reverse order and then collapse with |如果有相反顺序的子字符串并且也想匹配它们,请以相反的顺序创建pattern ,然后使用|折叠|

pat2 <- paste(rev(toMatch), collapse="\\b.*\\b")
pat <- paste(pat1, pat2, sep="|")
grep(pat, files.list, value = TRUE) 
#[1] "Fasted DWeib NoCmaxW.xlsx"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM