如何删除r中data.table中的所有重复行

Question

Let's say we have 让我们说我们有

library(data.table)    
dt <- data.table(Date = c(201405,201405,201504,201505, 201505,201505), ID = c(500,500,600,700,500, 700), INC = c(20,30,50,75,80,90))

return, 返回，

     Date  ID INC
1: 201405 500  20
2: 201405 500  30
3: 201504 600  50
4: 201505 700  75
5: 201505 500  80
6: 201505 700  90

I want to remove all IDs that are in the same Date. 我想删除同一日期中的所有ID。 The return should be 回归应该是

     Date  ID INC
1: 201504 600  50
2: 201505 500  80

Could you please suggest? 你能建议吗？

Answer 1

We group by 'ID', get a logical index with duplicated on the 'Date', and negate so that all the unique elements are now TRUE, use .I to get the row index, extract the index column 'V1' and use that to subset 'dt'. 我们按'ID'分组，得到一个在'Date'上duplicated的逻辑索引，并且否定所有唯一元素现在为TRUE，使用.I获取行索引，提取索引列'V1'并使用它子集'dt'。

dt[dt[, .I[!(duplicated(Date)|duplicated(Date, fromLast=TRUE))], ID]$V1]
#      Date  ID INC
#1: 201505 500  80
#2: 201504 600  50

Or another option would be to group by 'Date', 'ID' and if the nrow is equal to 1 ( .N==1 ), we get the Subset of Data.table ( .SD ). 或者另一种选择是按'Date'，'ID'分组， if nrow等于1（ .N==1 ），我们得到Data.table（ .SD ）的子集。

dt[, if(.N==1) .SD, .(Date, ID)]
#     Date  ID INC
#1: 201504 600  50
#2: 201505 500  80

Or as @Frank mentioned, we can use a data.table/base R combo 或者正如@Frank所提到的，我们可以使用data.table / base R combo

DT[ave(seq(.N), Date, ID, FUN = function(x) length(x) == 1L)]

如何删除r中data.table中的所有重复行

问题描述

1 个解决方案

解决方案1
5 已采纳 2015-10-21 06:29:34

如何删除r中data.table中的所有重复行

问题描述

1 个解决方案

解决方案1 5 已采纳 2015-10-21 06:29:34

解决方案1
5 已采纳 2015-10-21 06:29:34