[英]data.table - filtering rows grouped by ID in R
So i have data:所以我有数据:
# example data
ID <- c(rep("A", 5), rep("B", 6), rep("C", 2), rep("D", 3), rep("E", 4))
year <- as.numeric(c(rep(c(2012, 2013, 2014, 2015), 4), 2012, 2013, 2015, 2016))
mydata <- cbind(ID, year)
mydata <- as.data.table(mydata)
mydata$year <- as.numeric(mydata$year)
for this data, i have found out which IDs have at least three years of consecutive history:对于这些数据,我发现哪些 ID 具有至少三年的连续历史:
mydata2 <- mydata[, grp := cumsum(c(0, diff(year)==1)), by = ID][,max_grp := max(grp), by=ID][max_grp>=2]
Now, i want to keep only last three years of data for every ID:现在,我只想为每个 ID 保留最近三年的数据:
mydata2 <- mydata2[which(year >= max(year - 2)), by = ID]
the results are correct, but i get an warning here:结果是正确的,但我在这里收到警告:
Warning message:
In `[.data.table`(mydata2, which(year >= max(year - 3)), by = ID) :
Ignoring by= because j= is not supplied
Basically, i want to filter IDs, that have three years consecutive years and in case they have more than three years of consecutive history, i want to keep only three years.基本上,我想过滤连续三年的 ID,如果他们连续三年以上的历史,我只想保留三年。
Is there a better way to do this?有一个更好的方法吗? This doesnt seem very robust, even though i only have limited experience这似乎不是很强大,即使我只有有限的经验
Perhaps, you are looking for this:也许,您正在寻找这个:
library(data.table)
mydata2[, .SD[year >= max(year) - 2], by = ID]
# ID year grp max_grp
# 1: A 2013 1 3
# 2: A 2014 2 3
# 3: A 2015 3 3
# 4: B 2013 0 4
# 5: B 2014 1 4
# 6: B 2015 2 4
# 7: B 2013 3 4
# 8: B 2014 4 4
# 9: D 2013 0 2
#10: D 2014 1 2
#11: D 2015 2 2
#12: E 2015 1 2
#13: E 2016 2 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.