I'd like to identify all elements of a string that match an array of patterns. How do I do this? I'd like to avoid clunky for-loops, because I'd like to have the result be invariant to the order in which I specify the patterns.
Here is a simple (non-working) example.
regex = c('a','b')
words = c('goat','sheep','banana','aardvark','cow','bird')
grepl(regex,words)
[1] TRUE FALSE TRUE TRUE FALSE FALSE
Warning message:
In grepl(regex, words) :
argument 'pattern' has length > 1 and only the first element will be used
EDIT: Sorry, realized that I've seen the answer to this before and just forgotten it -- it'd be grepl('(a)|(b)',words)
, but I'd need some way of coercing the array into that form
Use sapply
:
> sapply(regex, grepl, words)
a b
[1,] TRUE FALSE
[2,] FALSE FALSE
[3,] TRUE TRUE
[4,] TRUE FALSE
[5,] FALSE FALSE
[6,] FALSE TRUE
The original question suggested that the above was what was wanted but then it was changed to ask for those elements which contain any element of regex
. In that case:
> grepl(paste(regex, collapse = "|"), words)
[1] TRUE FALSE TRUE TRUE FALSE TRUE
You could do it in the regular expression itself with a look-ahead. Here's an example of stitching the regular expression together from your search terms ( a
AND b
should only match banana
, make sure to set perl = TRUE
to enable the (?=...)
lookahead in your regexp). It should work for more complicated patterns as well, take a look at this tutorial for details on the look-ahead.
search <- c('a','b')
words <- c('goat','sheep','banana','aardvark','cow','bird')
regex <- paste(paste0("(?=.*", search, ")"), collapse = "")
matches <- grepl(regex,words, perl = T)
print(data.frame(words, matches))
UPDATE: this is for the original question of matching ALL search terms, matching ANY search terms can be achieved as indicated in the edit to the original question
Some time back, I wrote a function called needleInHaystack
that can be used as follows:
x <- needleInHaystack(regex, words)
x
# a b
# goat 1 0
# sheep 0 0
# banana 1 1
# aardvark 1 0
# cow 0 0
# bird 0 1
Depending on if you want all
or any
, it's easy to use apply
(or rowSums
).
apply(x, 1, function(x) any(as.logical(x)))
# goat sheep banana aardvark cow bird
# TRUE FALSE TRUE TRUE FALSE TRUE
apply(x, 1, function(x) all(as.logical(x)))
# goat sheep banana aardvark cow bird
# FALSE FALSE TRUE FALSE FALSE FALSE
It's designed for finding things even out of order. So, for example, "to" would match "goat". Not sure if that's a behavior you would want for your problem though.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.