[英]How to remove all duplicated rows in data.table in r
Let's say we have 让我们说我们有
library(data.table)
dt <- data.table(Date = c(201405,201405,201504,201505, 201505,201505), ID = c(500,500,600,700,500, 700), INC = c(20,30,50,75,80,90))
return, 返回,
Date ID INC
1: 201405 500 20
2: 201405 500 30
3: 201504 600 50
4: 201505 700 75
5: 201505 500 80
6: 201505 700 90
I want to remove all IDs that are in the same Date. 我想删除同一日期中的所有ID。 The return should be
回归应该是
Date ID INC
1: 201504 600 50
2: 201505 500 80
Could you please suggest? 你能建议吗?
We group by 'ID', get a logical index with duplicated
on the 'Date', and negate so that all the unique elements are now TRUE, use .I
to get the row index, extract the index column 'V1' and use that to subset 'dt'. 我们按'ID'分组,得到一个在'Date'上
duplicated
的逻辑索引,并且否定所有唯一元素现在为TRUE,使用.I
获取行索引,提取索引列'V1'并使用它子集'dt'。
dt[dt[, .I[!(duplicated(Date)|duplicated(Date, fromLast=TRUE))], ID]$V1]
# Date ID INC
#1: 201505 500 80
#2: 201504 600 50
Or another option would be to group by 'Date', 'ID' and if
the nrow is equal to 1 ( .N==1
), we get the Subset of Data.table ( .SD
). 或者另一种选择是按'Date','ID'分组,
if
nrow等于1( .N==1
),我们得到Data.table( .SD
)的子集。
dt[, if(.N==1) .SD, .(Date, ID)]
# Date ID INC
#1: 201504 600 50
#2: 201505 500 80
Or as @Frank mentioned, we can use a data.table/base R combo 或者正如@Frank所提到的,我们可以使用data.table / base R combo
DT[ave(seq(.N), Date, ID, FUN = function(x) length(x) == 1L)]
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.