简体   繁体   中英

R: subset matched words

Here,

a<-c("Look at the sky", "Sun is bright", "cloudy day")
b<-c("sky", "day")

I want to subset a based on b . My preferred answer is:

"Look at the sky", "cloudy day"

How to do this in R?

Option 1

You can match a against all terms in b with sapply

sapply(b, grepl, a)

       sky   day
[1,]  TRUE FALSE
[2,] FALSE FALSE
[3,] FALSE  TRUE

Then you collapse all rows with apply and subset a .

a[apply(sapply(b, grepl, a), 1, any)]

[1] "Look at the sky" "cloudy day"     

Option 2

Create a combined regexp pattern

paste(b, collapse="|")

[1] "sky|day"

and grep with it

a[grepl(paste(b, collapse="|"), a)]

[1] "Look at the sky" "cloudy day"     

Try the string searching facilities form the stringi package:

library(stringi)
a[sapply(a, function(ae) any(stri_detect_fixed(ae, b)))]
## [1] "Look at the sky" "cloudy day"

Here we detect whether each string in a contains any string in b as its subsequence.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM