Finding a value in R that contains a certain string

Question

Is there a way in R to find values in a column that contain a word? For example, I want to find all the values that contain the word "the", where some values of the column are "the_cat" and "the_dog" and "dog"

x <- c("the_dog", "the_cat", "dog")

Using the example above, the answer would be 2. I know this is relatively easy to do in Python, but I am wondering if there is a way to do this in R. Thanks!

Answer 1

Try:

sum(grepl("(?<![A-Za-z])the(?![A-Za-z])", x, perl = T))

This gives a sum of 2 on your example.

But let's consider also a slightly more complex example:

x <- c("the_dog", "the_cat", "dog", "theano", "menthe", " the")

Output:

[1] 3

Above we're trying to match any the that doesn't have another letter before or after (like eg theano ).

You could also add inside the [] other things you wouldn't like to match, like eg if you wouldn't consider the99 a word the , you would do [A-Za-z0-9] etc.

You can also use the above with stringr , for example (I've included the exclusion of numbers, so below the99 wouldn't be counted as a word):

library(stringr)

sum(str_detect(x, "(?<![A-Za-z0-9])the(?![A-Za-z0-9])"))

Answer 2

library(stringr)
##with a vector
sum(str_detect(c("the_dog", "the_cat", "dog"),"the"))

##In a dataframe

tibble(x = c("the_dog", "the_cat", "dog")) %>%
    filter(str_detect(x, "the")) %>%
    nrow()

Answer 3

x <- c("the_dog", "the_cat", "dog") 
stringr::str_detect(x, "the")
#> [1]  TRUE  TRUE FALSE

^{Created on 2019-02-23 by the reprex package (v0.2.1)}

Answer 4

Try also:

x <- c("the_dog", "the_cat", "dog")
sum(stringi::stri_count(x,regex="^the"))#matches the at the beginning

Result:

[1] 2

Or:

   x <- c("the_dog", "the_cat", "dog")
  sum(stringi::stri_count(x,regex="the{1,}"))#matches any the

Finding a value in R that contains a certain string

Question

4 answers

solution1
1 2019-02-23 16:42:36

solution2
0 2019-02-23 16:39:23

solution3
0 2019-02-23 16:40:49

solution4
0 2019-02-23 17:24:26

Finding a value in R that contains a certain string

Question

4 answers

solution1 1 2019-02-23 16:42:36

solution2 0 2019-02-23 16:39:23

solution3 0 2019-02-23 16:40:49

solution4 0 2019-02-23 17:24:26

solution1
1 2019-02-23 16:42:36

solution2
0 2019-02-23 16:39:23

solution3
0 2019-02-23 16:40:49

solution4
0 2019-02-23 17:24:26