繁体   English   中英

R:部分匹配到另一列时替换字符串

[英]R: Replace string when partial match to another column by row

我想替换/删除与我的数据表中的其他列( statecity )匹配的字符串( name )的那些部分。

我设法识别行,例如使用城市,如下所示: dt%>% filter(str_detect(name, city))但我缺少一种将gsub (或grep )与列城市的行值一起使用的方法。

我知道一种相当手动的方法,比如将所有城市名称存储在一个向量中并将它们输入到gsub中是可行的,但它也会错误地删除第 2 行的“达拉斯”。(虽然这对于各州来说是可以管理的,并且可以与 gsub 结合使用也删除“的”。)


数据和所需 output

dt<- data.table(city = c("arecibo","arecibo","cabo rojo", "new york", "dallas"), 
state=c("pr", "pr", "pr", "ny", "tx"), 
name=c("frutas of pr arecibo", "dallas frutas of pr", "cabo rojo metal plant", "greens new york", "cowboy shoes dallas tx"), 
desired=c("frutas", "dallas frutas", "metal plant", "greens", "cowboy shoes"))

这是一个解决方案,但使用gsub方法可能会更快地实现。 反正:

library(tidyverse)
dt %>% 
  mutate(test = str_remove_all(name,city)) %>% 
  mutate(test = str_remove_all(test,state)) %>% 
  mutate(test = str_remove_all(test," of ")) %>% 
  mutate(test = str_remove_all(test,"^ ")) %>% 
  mutate(test = str_remove_all(test," *$"))

Output:

        city state                   name       desired          test
1:   arecibo    pr   frutas of pr arecibo        frutas        frutas
2:   arecibo    pr    dallas frutas of pr dallas frutas dallas frutas
3: cabo rojo    pr  cabo rojo metal plant   metal plant   metal plant
4:  new york    ny        greens new york        greens        greens
5:    dallas    tx cowboy shoes dallas tx  cowboy shoes  cowboy shoes

data.table解决方案:

# Helper function
subxy <-  function(string, rmv) mapply(function(x, y) sub(x, '', y), rmv, string)

dt[,  desired2 := name |> subxy(city) |> subxy(state) |> subxy('of') |> trimws()]

#         city state                   name       desired      desired2
# 1:   arecibo    pr   frutas of pr arecibo        frutas        frutas
# 2:   arecibo    pr    dallas frutas of pr dallas frutas dallas frutas
# 3: cabo rojo    pr  cabo rojo metal plant   metal plant   metal plant
# 4:  new york    ny        greens new york        greens        greens
# 5:    dallas    tx cowboy shoes dallas tx  cowboy shoes  cowboy shoes

图书馆(dplyr)

dt %>% rowwise() %>%
    mutate(desired_2 = str_remove_all(name, paste(c(city, state, 'of'), collapse = '|'))%>%
               trimws())

# A tibble: 5 × 5
# Rowwise: 
  city      state name                   desired       desired_2    
  <chr>     <chr> <chr>                  <chr>         <chr>        
1 arecibo   pr    frutas of pr arecibo   frutas        frutas       
2 arecibo   pr    dallas frutas of pr    dallas frutas dallas frutas
3 cabo rojo pr    cabo rojo metal plant  metal plant   metal plant  
4 new york  ny    greens new york        greens        greens       
5 dallas    tx    cowboy shoes dallas tx cowboy shoes  cowboy shoes 

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM