简体   繁体   English

R:删除一列是另一列的子字符串的行

[英]R: Delete rows where one column is a substring of another

I have a data frame that looks like this:我有一个如下所示的数据框:

c1      c2
fish    fishing
dog     tomato
cat     loop
horse   horse

I would now like to delete every row where c1 == c2 AND where c1 is a substring of c2 and vice versa.我现在想删除 c1 == c2 AND 其中 c1 是 c2 的子字符串的每一行,反之亦然。 In my example, horse == horse and 'fish' is a substring of 'fishing'.在我的示例中,horse == horse 和 'fish' 是 'fishing' 的子字符串。 I know about the grepl function, eg: df[!grepl(df$c1, df$c2),] .我知道 grepl 函数,例如: df[!grepl(df$c1, df$c2),]

However, this solution does not account for substrings.但是,此解决方案不考虑子字符串。 Maybe there is a solution where I can use df[!grepl("STRING", df$c2),] for every row, so that "STRING" equals the value of df$c1?也许有一个解决方案,我可以对每一行使用df[!grepl("STRING", df$c2),] ,以便“STRING”等于 df$c1 的值?

Thanks in advance!提前致谢!

Using tidyverse :使用tidyverse

library(tidyverse)
df %>% 
        filter(!str_detect(c2, c1), !str_detect(c1, c2))

Output:输出:

    c1     c2
1: dog tomato
2: cat   loop

This will work no matter which columns have similar words (not just like in your specific example).无论哪些列具有相似的单词(不仅仅是在您的特定示例中),这都将起作用。

base R基数R

dat[!with(dat, mapply(grepl, c1, c2)) & !with(dat, mapply(grepl, c2, c1)),]
#    c1     c2
# 2 dog tomato
# 3 cat   loop

grepl only works on one pattern at a time: if you try multiple patterns (ie, each of dat$c1 ), then you'll receive a warning (and not the intended output). grepl仅适用于一种模式:如果您尝试多种模式(即,每个dat$c1 ),那么您将收到警告(而不是预期的输出)。

grepl(dat$c1, dat$c2)
# Warning in grepl(dat$c1, dat$c2) :
#   argument 'pattern' has length > 1 and only the first element will be used
# [1]  TRUE FALSE FALSE FALSE

We vectorize it (with mapply ) and run it iteratively on each of the c1 / c2 pairs.我们对其进行矢量化(使用mapply )并在每个c1 / c2对上迭代运行它。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如果数据框的一个列条目是另一列条目的子字符串,则删除行 - Delete rows if one column entry of a dataframe is a substring of another column entry 合并两个表,其中一列是 R 中另一列的子字符串 - Merging two tables where one column is substring of the other in R 查找数据框中的行,其中一列中的文本可以在 R 中的另一列中找到 - Find rows in a data frame where the text in one column can be found in another column, in R R从数据框中选择所有行,在该数据框中,一个值重复一列,但在另一列中具有特定值 - R select all rows from a dataframe where a value is duplicated one column but has a specific value in another column R data.table删除如果另一列不适用的情况下重复一列的行 - R data.table remove rows where one column is duplicated if another column is NA R-删除某一列的值与另一列不匹配的行 - R - removing rows where values of one column fail to match another column 使用 R 中的 dplyr 查找一列字符串在另一列中的行 - Find rows where one column string is in another column using dplyr in R R - 删除在一个列中找到两次值的行,一次用于另一列中的不同值 - R - Remove rows where the value in one column is found twice, once each for different values in another column R:如何删除仅一(两,三)列中与另一行不同的行? - R: How to delete rows that differ from another row in just one (two, three) column? 子集在一列中具有相同值的所有行,按另一列分组,其中第三列的至少一行包含 R 中的特定字母 - subset all rows with the same value in one column, grouped by another column, where at least one row of third column contains a specific letter in R
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM