R：删除一列是另一列的子字符串的行

Question

I have a data frame that looks like this:我有一个如下所示的数据框：

c1      c2
fish    fishing
dog     tomato
cat     loop
horse   horse

I would now like to delete every row where c1 == c2 AND where c1 is a substring of c2 and vice versa.我现在想删除 c1 == c2 AND 其中 c1 是 c2 的子字符串的每一行，反之亦然。 In my example, horse == horse and 'fish' is a substring of 'fishing'.在我的示例中，horse == horse 和 'fish' 是 'fishing' 的子字符串。 I know about the grepl function, eg: df[!grepl(df$c1, df$c2),] .我知道 grepl 函数，例如： df[!grepl(df$c1, df$c2),] 。

However, this solution does not account for substrings.但是，此解决方案不考虑子字符串。 Maybe there is a solution where I can use df[!grepl("STRING", df$c2),] for every row, so that "STRING" equals the value of df$c1?也许有一个解决方案，我可以对每一行使用df[!grepl("STRING", df$c2),] ，以便“STRING”等于 df$c1 的值？

Thanks in advance!提前致谢！

Answer 1

Using tidyverse :使用tidyverse ：

library(tidyverse)
df %>% 
        filter(!str_detect(c2, c1), !str_detect(c1, c2))

Output:输出：

    c1     c2
1: dog tomato
2: cat   loop

This will work no matter which columns have similar words (not just like in your specific example).无论哪些列具有相似的单词（不仅仅是在您的特定示例中），这都将起作用。

Answer 2

base R基数R

dat[!with(dat, mapply(grepl, c1, c2)) & !with(dat, mapply(grepl, c2, c1)),]
#    c1     c2
# 2 dog tomato
# 3 cat   loop

grepl only works on one pattern at a time: if you try multiple patterns (ie, each of dat$c1 ), then you'll receive a warning (and not the intended output). grepl仅适用于一种模式：如果您尝试多种模式（即，每个dat$c1 ），那么您将收到警告（而不是预期的输出）。

grepl(dat$c1, dat$c2)
# Warning in grepl(dat$c1, dat$c2) :
#   argument 'pattern' has length > 1 and only the first element will be used
# [1]  TRUE FALSE FALSE FALSE

We vectorize it (with mapply ) and run it iteratively on each of the c1 / c2 pairs.我们对其进行矢量化（使用mapply ）并在每个c1 / c2对上迭代运行它。

R：删除一列是另一列的子字符串的行

问题描述

2 个解决方案

解决方案1
2 2021-07-16 16:17:40

解决方案2
2 已采纳 2021-07-16 16:29:23

base R基数R

R：删除一列是另一列的子字符串的行

问题描述

2 个解决方案

解决方案1 2 2021-07-16 16:17:40

解决方案2 2 已采纳 2021-07-16 16:29:23

base R基数R

解决方案1
2 2021-07-16 16:17:40

解决方案2
2 已采纳 2021-07-16 16:29:23