[英]Remove rows from data.table that meet condition
I have a data table 我有一个数据表
DT <- data.table(col1=c("a", "b", "c", "c", "a"), col2=c("b", "a", "c", "a", "b"), condition=c(TRUE, FALSE, FALSE, TRUE, FALSE))
col1 col2 condition
1: a b TRUE
2: b a FALSE
3: c c FALSE
4: c a TRUE
5: a b FALSE
and would like to remove rows on the following conditions: 并希望在以下情况下删除行:
condition==TRUE
(rows 1 and 4) condition==TRUE
每一行(第1行和第4行) condition==TRUE
(that is row 5, col1=a, col2=b) condition==TRUE
行(即第5行,col1 = a,col2 = b)具有相同值的每一行 condition==TRUE
, but with col1 and col2 switched (that is row 2, col1=b and col2=a) condition==TRUE
每一行,但切换了col1和col2(即第2行,col1 = b和col2 = a) So only row 3 should stay. 因此,只有第3行可以保留。
I'm doing this by making a new data table DTcond
with all rows meeting the condition, looping over the values for col1 and col2, and collecting the indices from DT
which will be removed. 我这样做是通过创建一个新的数据表
DTcond
,使所有符合条件的行,遍历col1和col2的值,并从DT
收集将被删除的索引。
DTcond <- DT[condition==TRUE,]
indices <- c()
for (i in 1:nrow(DTcond)) {
n1 <- DTcond[i, col1]
n2 <- DTcond[i, col2]
indices <- c(indices, DT[ ((col1 == n1 & col2 == n2) | (col1==n2 & col2 == n1)), which=T])
}
DT[!indices,]
col1 col2 condition
1: c c FALSE
This works but is terrible slow for large datasets and I guess there must be other ways in data.table to do this without loops or apply. 这可行,但是对于大型数据集来说速度太慢了,我想在data.table中必须有其他方法可以做到无循环或不应用。 Any suggestions how I could improve this (I'm new to data.table)?
有什么建议可以改善这一点(我是data.table的新手)?
You can do an anti join: 您可以进行反连接:
mDT = DT[(condition), !"condition"][, rbind(.SD, rev(.SD), use.names = FALSE)]
DT[!mDT, on=names(mDT)]
# col1 col2 condition
# 1: c c FALSE
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.