如果数据框的一个列条目是另一列条目的子字符串，则删除行

Question

I have a dataframe with two columns, V1 and V2, with entries such as A1, A2, A1+A2, A3, in both columns. 我有一个数据框，其中有两列V1和V2，两列中都有诸如A1，A2，A1 + A2，A3之类的条目。

I want to delete rows if either column contains a substring of the other. 如果其中任一列包含其他子字符串，我想删除行。 So, for example, I would want to delete rows like this: 因此，例如，我想删除这样的行：

A1, A1+A2

A1+A2,A1

but not rows like this: 但不是这样的行：

A1+A2, A3

I am currently using this code: 我目前正在使用此代码：

subset(dat, !dat$V1 %in% dat$V2)

but this code gets rid of rows like A1/B1, A2-B2 and A 02, A4 when I want to keep those rows. 但是当我想保留这些行时，此代码摆脱了诸如A1 / B1，A2-B2和A 02，A4之类的行。

I am thinking I can use charmatch, maybe like this: 我想我可以使用charmatch，也许是这样的：

subset(dat, charmatch(dat$V1, dat$V2) == "NA")

but this returns an empty dataframe. 但这会返回一个空的数据框。

When I run this code to check what charmatch would get rid of: 当我运行以下代码来检查将删除哪些charmatch时：

trial <- subset(dat, charmatch(dat$V1, dat$V2) != "NA")

rows such as A1/B1, A2-B2 and A 02, A4 appear when I want to keep those rows. 当我要保留这些行时，会出现诸如A1 / B1，A2-B2和A 02，A4之类的行。

I think the problem might be in that A 02 has a space, but am not sure how to resolve this. 我认为问题可能在于A 02带有空格，但不确定如何解决此问题。

I also thought about using grep/grepl and regular expressions, but am not sure how this would look syntactically when I am searching one column's expression against another column. 我还考虑过使用grep / grepl和正则表达式，但是我不确定当我针对另一列搜索一列的表达式时，它在语法上的外观如何。 Would I convert the first column into a vector and use: 我将第一列转换为向量并使用：

subset(dat, !grepl(V1vector, dat$V2))

? ？

Any ideas? 有任何想法吗？

Here is some of the dataset: 这是一些数据集：

V1          V2
A3-B3   B3  
A4/B4   A3-B3   
A 28    A 05    
A 28    A 06    
A2-B2   A2  
B 05    B1

And this is what I would like it to look like: 这就是我想要的样子：

V1         V2
A4/B4      A3-B3
A 28       A 05
A 28       A 06
B 05       B1

Answer 1

尝试这个：

df[!mapply(grepl, df$V2, df$V1),]

Answer 2

Minimal dataset: 最小数据集：

f <- structure(list(V1 = c("A3-B3", "A4/B4", "A 28", "A 28", "A2-B2", 
"B 05"), V2 = c("B3", "A3-B3", "A 05", "A 06", "A2", "B1")), .Names = c("V1", 
"V2"), row.names = c(NA, -6L), class = "data.frame")

##entries of V1 that contain V2
mapply(grepl, f$V2, f$V1, MoreArgs=list(fixed=TRUE)) 
##entries of V2 that contain V1
mapply(grepl, f$V1, f$V2, MoreArgs=list(fixed=TRUE))

##combine the two negations
f[!mapply(grepl, f$V2, f$V1, MoreArgs=list(fixed=TRUE)) & 
  !mapply(grepl, f$V1, f$V2, MoreArgs=list(fixed=TRUE)),]

如果数据框的一个列条目是另一列条目的子字符串，则删除行

问题描述

2 个解决方案

解决方案1
0 已采纳 2013-08-15 19:11:37

解决方案2
0 2013-08-15 19:19:32

如果数据框的一个列条目是另一列条目的子字符串，则删除行

问题描述

2 个解决方案

解决方案1 0 已采纳 2013-08-15 19:11:37

解决方案2 0 2013-08-15 19:19:32

解决方案1
0 已采纳 2013-08-15 19:11:37

解决方案2
0 2013-08-15 19:19:32