I would like to replace/remove those parts of a string ( name
) that match to other columns ( state
and city
) in my data table.
I managed to identify the rows, eg with city, like so: dt%>% filter(str_detect(name, city))
but I am missing a way to use gsub
(or grep
) with the rowwise value of the column city.
I know that a rather manual approach like storing all city names in a vector and enter them in gsub
would work but it would also falsely remove the "dallas" of row 2. (This was manageable for states though and could be combined with gsub to also remove "of".)
Data and desired output
dt<- data.table(city = c("arecibo","arecibo","cabo rojo", "new york", "dallas"),
state=c("pr", "pr", "pr", "ny", "tx"),
name=c("frutas of pr arecibo", "dallas frutas of pr", "cabo rojo metal plant", "greens new york", "cowboy shoes dallas tx"),
desired=c("frutas", "dallas frutas", "metal plant", "greens", "cowboy shoes"))
Here's a solution, but it can probably be achieved faster with gsub
methods. Anyway:
library(tidyverse)
dt %>%
mutate(test = str_remove_all(name,city)) %>%
mutate(test = str_remove_all(test,state)) %>%
mutate(test = str_remove_all(test," of ")) %>%
mutate(test = str_remove_all(test,"^ ")) %>%
mutate(test = str_remove_all(test," *$"))
Output:
city state name desired test
1: arecibo pr frutas of pr arecibo frutas frutas
2: arecibo pr dallas frutas of pr dallas frutas dallas frutas
3: cabo rojo pr cabo rojo metal plant metal plant metal plant
4: new york ny greens new york greens greens
5: dallas tx cowboy shoes dallas tx cowboy shoes cowboy shoes
A data.table
solution:
# Helper function
subxy <- function(string, rmv) mapply(function(x, y) sub(x, '', y), rmv, string)
dt[, desired2 := name |> subxy(city) |> subxy(state) |> subxy('of') |> trimws()]
# city state name desired desired2
# 1: arecibo pr frutas of pr arecibo frutas frutas
# 2: arecibo pr dallas frutas of pr dallas frutas dallas frutas
# 3: cabo rojo pr cabo rojo metal plant metal plant metal plant
# 4: new york ny greens new york greens greens
# 5: dallas tx cowboy shoes dallas tx cowboy shoes cowboy shoes
library(dplyr)
dt %>% rowwise() %>%
mutate(desired_2 = str_remove_all(name, paste(c(city, state, 'of'), collapse = '|'))%>%
trimws())
# A tibble: 5 × 5
# Rowwise:
city state name desired desired_2
<chr> <chr> <chr> <chr> <chr>
1 arecibo pr frutas of pr arecibo frutas frutas
2 arecibo pr dallas frutas of pr dallas frutas dallas frutas
3 cabo rojo pr cabo rojo metal plant metal plant metal plant
4 new york ny greens new york greens greens
5 dallas tx cowboy shoes dallas tx cowboy shoes cowboy shoes
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.