简体   繁体   中英

Compare strings in two columns in r

Suppose I've got this data:

 ColA               ColB             
------------       ------------------------
 apple tree         Mary has an apple tree
 orange+apple       Lucy loves orange+apple
 orange apple       Anne loves orange+apple

I want to evaluate if ColB contains ColA and create a logical variable:

  ColA               ColB                       Ind
------------       ------------------------     -----
 apple tree         Mary has an apple tree      TRUE
 orange+apple       Lucy loves orange+apple     TRUE
 orange apple       Anne loves orange+apple     FALSE

Any Suggestions using R?

Many thanks!

We can use str_detect which is vectorized for both patterns and string

library(dplyr)
library(stringr)
df1 <- df1 %>%
           mutate(Ind = str_detect(ColB, fixed(ColA)))

-output

df1
#         ColA                    ColB   Ind
#1   apple tree  Mary has an apple tree  TRUE
#2 orange+apple Lucy loves orange+apple  TRUE
#3 orange apple Anne loves orange+apple FALSE

data

df1 <- structure(list(ColA = c("apple tree", "orange+apple", "orange apple"
), ColB = c("Mary has an apple tree", "Lucy loves orange+apple", 
"Anne loves orange+apple")), class = "data.frame", row.names = c(NA, 
-3L))

Here is a base R option using Vectorize over grepl

within(
  df,
  Ind <- Vectorize(grepl)(ColA,ColB,fix = TRUE)
)

giving

          ColA                    ColB   Ind
1   apple tree  Mary has an apple tree  TRUE
2 orange+apple Lucy loves orange+apple  TRUE
3 orange apple Anne loves orange+apple FALSE

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM