带有模式数组的正则表达式（R中）

Question

I'd like to identify all elements of a string that match an array of patterns. 我想识别与模式数组匹配的字符串的所有元素。 How do I do this? 我该怎么做呢？ I'd like to avoid clunky for-loops, because I'd like to have the result be invariant to the order in which I specify the patterns. 我想避免笨拙的for循环，因为我希望结果不会改变指定模式的顺序。

Here is a simple (non-working) example. 这是一个简单的（无效）示例。

regex = c('a','b')
words = c('goat','sheep','banana','aardvark','cow','bird')
grepl(regex,words)
[1]  TRUE FALSE  TRUE  TRUE FALSE FALSE
Warning message:
In grepl(regex, words) :
  argument 'pattern' has length > 1 and only the first element will be used

EDIT: Sorry, realized that I've seen the answer to this before and just forgotten it -- it'd be grepl('(a)|(b)',words) , but I'd need some way of coercing the array into that form 编辑：对不起，意识到我之前已经看到了答案，只是忘记了它-可能是grepl('(a)|(b)',words) ，但是我需要一些强制方法排列成这种形式

Answer 1

Use sapply : 使用sapply ：

> sapply(regex, grepl, words)
         a     b
[1,]  TRUE FALSE
[2,] FALSE FALSE
[3,]  TRUE  TRUE
[4,]  TRUE FALSE
[5,] FALSE FALSE
[6,] FALSE  TRUE

The original question suggested that the above was what was wanted but then it was changed to ask for those elements which contain any element of regex . 最初的问题建议上述是所要的，但随后更改为要求包含regex任何元素的那些元素。 In that case: 在这种情况下：

> grepl(paste(regex, collapse = "|"), words)
[1]  TRUE FALSE  TRUE  TRUE FALSE  TRUE

Answer 2

You could do it in the regular expression itself with a look-ahead. 您可以提前在正则表达式中进行操作。 Here's an example of stitching the regular expression together from your search terms ( a AND b should only match banana , make sure to set perl = TRUE to enable the (?=...) lookahead in your regexp). 这是将搜索词中的正则表达式拼接在一起的示例（ a和b只应匹配banana ，请确保将perl = TRUE设置为在正则表达式中启用(?=...)超前）。 It should work for more complicated patterns as well, take a look at this tutorial for details on the look-ahead. 它也应该适用于更复杂的模式，请查看本教程以获取有关预读的详细信息。

search <- c('a','b')
words <- c('goat','sheep','banana','aardvark','cow','bird')
regex <- paste(paste0("(?=.*", search, ")"), collapse = "")
matches <- grepl(regex,words, perl = T)
print(data.frame(words, matches))

UPDATE: this is for the original question of matching ALL search terms, matching ANY search terms can be achieved as indicated in the edit to the original question 更新：这是针对匹配所有搜索词的原始问题，可以如对原始问题的编辑中所述实现匹配任何搜索词

Answer 3

Some time back, I wrote a function called needleInHaystack that can be used as follows: needleInHaystack ，我编写了一个名为needleInHaystack的函数，该函数可以按如下方式使用：

x <- needleInHaystack(regex, words)
x
#          a b
# goat     1 0
# sheep    0 0
# banana   1 1
# aardvark 1 0
# cow      0 0
# bird     0 1

Depending on if you want all or any , it's easy to use apply (or rowSums ). 根据如果你想对all或any ，很容易使用的apply （或rowSums ）。

apply(x, 1, function(x) any(as.logical(x)))
#     goat    sheep   banana aardvark      cow     bird 
#     TRUE    FALSE     TRUE     TRUE    FALSE     TRUE 
apply(x, 1, function(x) all(as.logical(x)))
#     goat    sheep   banana aardvark      cow     bird 
#    FALSE    FALSE     TRUE    FALSE    FALSE    FALSE

It's designed for finding things even out of order. 它旨在发现混乱的事物。 So, for example, "to" would match "goat". 因此，例如，“到”将匹配“山羊”。 Not sure if that's a behavior you would want for your problem though. 不确定这是否是您想要解决的问题。

带有模式数组的正则表达式（R中）

问题描述

3 个解决方案

解决方案1
1 已采纳 2014-04-12 18:09:56

解决方案2
1 2014-04-12 18:14:31

解决方案3
0 2014-04-13 03:35:09

带有模式数组的正则表达式（R中）

问题描述

3 个解决方案

解决方案1 1 已采纳 2014-04-12 18:09:56

解决方案2 1 2014-04-12 18:14:31

解决方案3 0 2014-04-13 03:35:09

解决方案1
1 已采纳 2014-04-12 18:09:56

解决方案2
1 2014-04-12 18:14:31

解决方案3
0 2014-04-13 03:35:09