简体   繁体   中英

Finding a value in R that contains a certain string

Is there a way in R to find values in a column that contain a word? For example, I want to find all the values that contain the word "the", where some values of the column are "the_cat" and "the_dog" and "dog"

x <- c("the_dog", "the_cat", "dog")

Using the example above, the answer would be 2. I know this is relatively easy to do in Python, but I am wondering if there is a way to do this in R. Thanks!

Try:

sum(grepl("(?<![A-Za-z])the(?![A-Za-z])", x, perl = T))

This gives a sum of 2 on your example.

But let's consider also a slightly more complex example:

x <- c("the_dog", "the_cat", "dog", "theano", "menthe", " the")

Output:

[1] 3

Above we're trying to match any the that doesn't have another letter before or after (like eg theano ).

You could also add inside the [] other things you wouldn't like to match, like eg if you wouldn't consider the99 a word the , you would do [A-Za-z0-9] etc.

You can also use the above with stringr , for example (I've included the exclusion of numbers, so below the99 wouldn't be counted as a word):

library(stringr)

sum(str_detect(x, "(?<![A-Za-z0-9])the(?![A-Za-z0-9])"))
library(stringr)
##with a vector
sum(str_detect(c("the_dog", "the_cat", "dog"),"the"))

##In a dataframe

tibble(x = c("the_dog", "the_cat", "dog")) %>%
    filter(str_detect(x, "the")) %>%
    nrow()
x <- c("the_dog", "the_cat", "dog") 
stringr::str_detect(x, "the")
#> [1]  TRUE  TRUE FALSE

Created on 2019-02-23 by the reprex package (v0.2.1)

Try also:

x <- c("the_dog", "the_cat", "dog")
sum(stringi::stri_count(x,regex="^the"))#matches the at the beginning

Result:

[1] 2

Or:

   x <- c("the_dog", "the_cat", "dog")
  sum(stringi::stri_count(x,regex="the{1,}"))#matches any the

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM