data.table - 在 R 中过滤按 ID 分组的行

Question

So i have data:所以我有数据：

# example data
ID <- c(rep("A", 5), rep("B", 6), rep("C", 2), rep("D", 3), rep("E", 4))
year <- as.numeric(c(rep(c(2012, 2013, 2014, 2015), 4), 2012, 2013, 2015, 2016))
mydata <- cbind(ID, year)
mydata <- as.data.table(mydata)
mydata$year <- as.numeric(mydata$year)

for this data, i have found out which IDs have at least three years of consecutive history:对于这些数据，我发现哪些 ID 具有至少三年的连续历史：

mydata2 <- mydata[, grp := cumsum(c(0, diff(year)==1)), by = ID][,max_grp := max(grp), by=ID][max_grp>=2]

Now, i want to keep only last three years of data for every ID:现在，我只想为每个 ID 保留最近三年的数据：

mydata2 <- mydata2[which(year >= max(year - 2)), by = ID]

the results are correct, but i get an warning here:结果是正确的，但我在这里收到警告：

Warning message:
In `[.data.table`(mydata2, which(year >= max(year - 3)), by = ID) :
  Ignoring by= because j= is not supplied

Basically, i want to filter IDs, that have three years consecutive years and in case they have more than three years of consecutive history, i want to keep only three years.基本上，我想过滤连续三年的 ID，如果他们连续三年以上的历史，我只想保留三年。

Is there a better way to do this?有一个更好的方法吗？ This doesnt seem very robust, even though i only have limited experience这似乎不是很强大，即使我只有有限的经验

Answer 1

Perhaps, you are looking for this:也许，您正在寻找这个：

library(data.table)
mydata2[, .SD[year >= max(year) - 2], by = ID]

#    ID year grp max_grp
# 1:  A 2013   1       3
# 2:  A 2014   2       3
# 3:  A 2015   3       3
# 4:  B 2013   0       4
# 5:  B 2014   1       4
# 6:  B 2015   2       4
# 7:  B 2013   3       4
# 8:  B 2014   4       4
# 9:  D 2013   0       2
#10:  D 2014   1       2
#11:  D 2015   2       2
#12:  E 2015   1       2
#13:  E 2016   2       2

data.table - 在 R 中过滤按 ID 分组的行

问题描述

1 个解决方案

解决方案1
1 已采纳 2020-05-15 08:55:25

data.table - 在 R 中过滤按 ID 分组的行

问题描述

1 个解决方案

解决方案1 1 已采纳 2020-05-15 08:55:25

解决方案1
1 已采纳 2020-05-15 08:55:25