[英]Summarize rows with identical identifiers with earliest start and latest end and highest value in data.table R
Following data.table
以下
data.table
dt <- data.table(
ID= c(1,2,2,2,2),
Value1 = c('a','b','a','a','a'),
Start = c('2001-01-01','2000-01-01','2000-02-02','2000-03-03','2000-03-03'),
End = c('2002-01-01','2001-01-01','2001-02-02','2001-03-03','2001-03-03'),
Value_max = c(2,50,20,40,80)
)
ID Value1 Start End Value_max
1: 1 a 2001-01-01 2002-01-01 2
2: 2 b 2000-01-01 2001-01-01 50
3: 2 a 2000-02-02 2001-02-02 20
4: 2 a 2000-03-03 2001-03-03 40
5: 2 a 2000-03-03 2001-03-03 80
I want to combine rows with identical ID
and Value1
extracting earliest Start
, latest End
and highest Value_max
.我想组合具有相同
ID
和Value1
的行,提取最早的Start
、最新的End
和最高Value_max
。 I have used dt[,SD.[which.max(Value_max)],by=.c(ID,Value1)]
but don't know how to combine it with the earliest start and end date.我用过
dt[,SD.[which.max(Value_max)],by=.c(ID,Value1)]
但不知道如何将它与最早的开始和结束日期结合起来。
min
and max
seem to be enough: min
和max
似乎就足够了:
dt[,.(earliest = min(Start),latest = max(End), value_max = max(Value_max)),by=.(ID,Value1)]
ID Value1 earliest latest value_max
1: 1 a 2001-01-01 2002-01-01 2
2: 2 b 2000-01-01 2001-01-01 50
3: 2 a 2000-02-02 2001-03-03 80
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.