简体   繁体   中英

New column based on if col1 is a substring of col2

I'm trying to make a new column based on whether one column is a substring of another. Using if_else & grepl works with a constant, but not comparing two columns to each other.

df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
                 col2 = c("first street #6", "second st", "third st"))

df <- df %>% dplyr::mutate(test = if_else(grepl("st", col2,fixed=TRUE),1,0)) # WORKS
df <- df %>% dplyr::mutate(test2 = if_else(grepl(col1, col2,fixed=TRUE),1,0)) # ERROR

Warning message:
Problem with `mutate()` column `test`.
i `test = if_else(grepl(col1, col2, fixed = TRUE), 1, 0)`.
i argument 'pattern' has length > 1 and only the first element will be used 

>df
    col1            col2               test test2
1   first street    first street #6    1     1
2   second st       second st          1     0    <--- should be 1
3   third st apt1   third st           1     0

Why can't I use both the variable columns in the grepl? It works fine under the mutate, for instance test3 = paste(col1, col2) returns the expected result.

You could use rowwise() before the mutate or you could use str_detect() from stringr :

library(tidyverse)
df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
                 col2 = c("first street #6", "2nd st", "third st"))

df <- df %>% rowwise() %>% dplyr::mutate(test2 = if_else(grepl(col1, col2,fixed=TRUE),1,0)) 
df
#> # A tibble: 3 × 3
#> # Rowwise: 
#>   col1          col2            test2
#>   <chr>         <chr>           <dbl>
#> 1 first street  first street #6     1
#> 2 second st     2nd st              0
#> 3 third st apt1 third st            0


df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
                 col2 = c("first street #6", "2nd st", "third st"))

df <- df %>% dplyr::mutate(test2 = if_else(str_detect(col2, col1),1,0)) 
df
#>            col1            col2 test2
#> 1  first street first street #6     1
#> 2     second st          2nd st     0
#> 3 third st apt1        third st     0

Created on 2022-02-01 by the reprex package (v2.0.1)

Or create a function and apply it to your data:)

df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
                 col2 = c("first street #6", "2nd st", "third st"))

a <- function(x, y) {
  if(grepl({{x}}, {{y}}, fixed = TRUE)) {
    b <- 1
  }
  else {
    b <- 0
  }
  return(b)
}

df |> dplyr::mutate(test = mapply(function(x,y) a(x, y), col1, col2))
#>            col1            col2 test
#> 1  first street first street #6    1
#> 2     second st          2nd st    0
#> 3 third st apt1        third st    0
df |> dplyr::mutate(test = mapply(function(x,y) a(x, y), col2, col1))
#>            col1            col2 test
#> 1  first street first street #6    0
#> 2     second st          2nd st    0
#> 3 third st apt1        third st    1

Created on 2022-02-01 by the reprex package (v2.0.1)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM