繁体   English   中英

如何将一列中的多个字符串与另一列中的多个字符串匹配删除 R 中的匹配项?

[英]How to match multiple strings in one column with multiple strings in another column remove matches in R?

这是我的代码:

A <- c("ruler measure", "measure rulers", "rulers")
B <- c("you can measure things with rulers", "you can measure things with rulers", "you can measure things with rulers")
df <- data.frame(as.character(A), as.character(B))

df_new <- df %>%
  mutate(
    new_B = str_replace_all(B, A, "")
  )

我想要的是列看起来像这样:

A                          B   
ruler measure             you can things with
measure rulers            you can things with
rulers                    you can measures things with  

但是, str_replace_all() 似乎只替换了 A 和 B 的一个匹配项(例如,标尺),而不是另一个匹配项(例如,度量)

谢谢您的帮助!!

我们可以用|替换空格

library(dplyr)
library(stringr)
df %>% 
   mutate(new_B = str_replace_all(B, str_replace(A, " ", "|"), ''))

这是一个基本的 R 解决方案

df <- within(df,
             new_B <- mapply(gsub, 
                             sapply(strsplit(as.character(A),"\\s+"),
                                    function(v) paste0(paste0("\\s+?",v,".*?\\b"),collapse = "|")),
                             "",
                             B))

以至于

> df
               A                                   B                       new_B
1  ruler measure  you can measure things with rulers         you can things with
2 measure rulers you can measures things with rulers         you can things with
3         rulers  you can measure things with rulers you can measure things with

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM