[英]R: Replace string when partial match to another column by row
我想替换/删除与我的数据表中的其他列( state
和city
)匹配的字符串( name
)的那些部分。
我设法识别行,例如使用城市,如下所示: dt%>% filter(str_detect(name, city))
但我缺少一种将gsub
(或grep
)与列城市的行值一起使用的方法。
我知道一种相当手动的方法,比如将所有城市名称存储在一个向量中并将它们输入到gsub
中是可行的,但它也会错误地删除第 2 行的“达拉斯”。(虽然这对于各州来说是可以管理的,并且可以与 gsub 结合使用也删除“的”。)
数据和所需 output
dt<- data.table(city = c("arecibo","arecibo","cabo rojo", "new york", "dallas"),
state=c("pr", "pr", "pr", "ny", "tx"),
name=c("frutas of pr arecibo", "dallas frutas of pr", "cabo rojo metal plant", "greens new york", "cowboy shoes dallas tx"),
desired=c("frutas", "dallas frutas", "metal plant", "greens", "cowboy shoes"))
这是一个解决方案,但使用gsub
方法可能会更快地实现。 反正:
library(tidyverse)
dt %>%
mutate(test = str_remove_all(name,city)) %>%
mutate(test = str_remove_all(test,state)) %>%
mutate(test = str_remove_all(test," of ")) %>%
mutate(test = str_remove_all(test,"^ ")) %>%
mutate(test = str_remove_all(test," *$"))
Output:
city state name desired test
1: arecibo pr frutas of pr arecibo frutas frutas
2: arecibo pr dallas frutas of pr dallas frutas dallas frutas
3: cabo rojo pr cabo rojo metal plant metal plant metal plant
4: new york ny greens new york greens greens
5: dallas tx cowboy shoes dallas tx cowboy shoes cowboy shoes
data.table
解决方案:
# Helper function
subxy <- function(string, rmv) mapply(function(x, y) sub(x, '', y), rmv, string)
dt[, desired2 := name |> subxy(city) |> subxy(state) |> subxy('of') |> trimws()]
# city state name desired desired2
# 1: arecibo pr frutas of pr arecibo frutas frutas
# 2: arecibo pr dallas frutas of pr dallas frutas dallas frutas
# 3: cabo rojo pr cabo rojo metal plant metal plant metal plant
# 4: new york ny greens new york greens greens
# 5: dallas tx cowboy shoes dallas tx cowboy shoes cowboy shoes
图书馆(dplyr)
dt %>% rowwise() %>%
mutate(desired_2 = str_remove_all(name, paste(c(city, state, 'of'), collapse = '|'))%>%
trimws())
# A tibble: 5 × 5
# Rowwise:
city state name desired desired_2
<chr> <chr> <chr> <chr> <chr>
1 arecibo pr frutas of pr arecibo frutas frutas
2 arecibo pr dallas frutas of pr dallas frutas dallas frutas
3 cabo rojo pr cabo rojo metal plant metal plant metal plant
4 new york ny greens new york greens greens
5 dallas tx cowboy shoes dallas tx cowboy shoes cowboy shoes
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.