I'm trying to make a new column based on whether one column is a substring of another. Using if_else & grepl works with a constant, but not comparing two columns to each other.
df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
col2 = c("first street #6", "second st", "third st"))
df <- df %>% dplyr::mutate(test = if_else(grepl("st", col2,fixed=TRUE),1,0)) # WORKS
df <- df %>% dplyr::mutate(test2 = if_else(grepl(col1, col2,fixed=TRUE),1,0)) # ERROR
Warning message:
Problem with `mutate()` column `test`.
i `test = if_else(grepl(col1, col2, fixed = TRUE), 1, 0)`.
i argument 'pattern' has length > 1 and only the first element will be used
>df
col1 col2 test test2
1 first street first street #6 1 1
2 second st second st 1 0 <--- should be 1
3 third st apt1 third st 1 0
Why can't I use both the variable columns in the grepl? It works fine under the mutate, for instance test3 = paste(col1, col2)
returns the expected result.
You could use rowwise()
before the mutate or you could use str_detect()
from stringr
:
library(tidyverse)
df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
col2 = c("first street #6", "2nd st", "third st"))
df <- df %>% rowwise() %>% dplyr::mutate(test2 = if_else(grepl(col1, col2,fixed=TRUE),1,0))
df
#> # A tibble: 3 × 3
#> # Rowwise:
#> col1 col2 test2
#> <chr> <chr> <dbl>
#> 1 first street first street #6 1
#> 2 second st 2nd st 0
#> 3 third st apt1 third st 0
df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
col2 = c("first street #6", "2nd st", "third st"))
df <- df %>% dplyr::mutate(test2 = if_else(str_detect(col2, col1),1,0))
df
#> col1 col2 test2
#> 1 first street first street #6 1
#> 2 second st 2nd st 0
#> 3 third st apt1 third st 0
Created on 2022-02-01 by the reprex package (v2.0.1)
Or create a function and apply it to your data:)
df <- data.frame(col1 = c("first street", "second st", "third st apt1"),
col2 = c("first street #6", "2nd st", "third st"))
a <- function(x, y) {
if(grepl({{x}}, {{y}}, fixed = TRUE)) {
b <- 1
}
else {
b <- 0
}
return(b)
}
df |> dplyr::mutate(test = mapply(function(x,y) a(x, y), col1, col2))
#> col1 col2 test
#> 1 first street first street #6 1
#> 2 second st 2nd st 0
#> 3 third st apt1 third st 0
df |> dplyr::mutate(test = mapply(function(x,y) a(x, y), col2, col1))
#> col1 col2 test
#> 1 first street first street #6 0
#> 2 second st 2nd st 0
#> 3 third st apt1 third st 1
Created on 2022-02-01 by the reprex package (v2.0.1)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.