简体   繁体   English

如何删除r中data.table中的所有重复行

[英]How to remove all duplicated rows in data.table in r

Let's say we have 让我们说我们有

library(data.table)    
dt <- data.table(Date = c(201405,201405,201504,201505, 201505,201505), ID = c(500,500,600,700,500, 700), INC = c(20,30,50,75,80,90))

return, 返回,

     Date  ID INC
1: 201405 500  20
2: 201405 500  30
3: 201504 600  50
4: 201505 700  75
5: 201505 500  80
6: 201505 700  90

I want to remove all IDs that are in the same Date. 我想删除同一日期中的所有ID。 The return should be 回归应该是

     Date  ID INC
1: 201504 600  50
2: 201505 500  80

Could you please suggest? 你能建议吗?

We group by 'ID', get a logical index with duplicated on the 'Date', and negate so that all the unique elements are now TRUE, use .I to get the row index, extract the index column 'V1' and use that to subset 'dt'. 我们按'ID'分组,得到一个在'Date'上duplicated的逻辑索引,并且否定所有唯一元素现在为TRUE,使用.I获取行索引,提取索引列'V1'并使用它子集'dt'。

dt[dt[, .I[!(duplicated(Date)|duplicated(Date, fromLast=TRUE))], ID]$V1]
#      Date  ID INC
#1: 201505 500  80
#2: 201504 600  50

Or another option would be to group by 'Date', 'ID' and if the nrow is equal to 1 ( .N==1 ), we get the Subset of Data.table ( .SD ). 或者另一种选择是按'Date','ID'分组, if nrow等于1( .N==1 ),我们得到Data.table( .SD )的子集。

dt[, if(.N==1) .SD, .(Date, ID)]
#     Date  ID INC
#1: 201504 600  50
#2: 201505 500  80

Or as @Frank mentioned, we can use a data.table/base R combo 或者正如@Frank所提到的,我们可以使用data.table / base R combo

DT[ave(seq(.N), Date, ID, FUN = function(x) length(x) == 1L)]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM