[英]How to extract last date of subsequence in a data.frame in R?
I'm struggling for a while with the following dataset: 我正在努力使用以下数据集:
id date var1 var2
1 7031 2008-12-01 27 1
2 7031 2009-01-05 6 0
3 7031 2009-02-02 0 3
4 7031 2008-11-01 1 4
5 7500 2009-07-11 30 0
6 7500 2009-10-01 8 0
7 7500 2010-01-01 0 0
8 7041 2009-06-20 26 0
9 7041 2009-08-01 0 0
10 0277 2009-01-01 3 0
I would like to output for each id the last date with non-zero variables. 我想为每个id输出带有非零变量的最后日期。 Time series for these users are of different length.
这些用户的时间序列长度不同。 I expect as an output smth like:
我期望作为输出smth像:
id last_date
7031 2009-02-02
7500 2009-10-01
7041 2009-06-20
0277 2009-01-01
Any help would be appreciated! 任何帮助,将不胜感激!
First, subset your data, and then use aggregate()
: 首先,对数据进行子集处理,然后使用
aggregate()
:
Here's your sample data: 这是您的示例数据:
x <- read.table(header = TRUE, stringsAsFactors=FALSE, text = "
id date var1 var2
1 '7031' 2008-12-01 27 1
2 '7031' 2009-01-05 6 0
3 '7031' 2009-02-02 0 3
4 '7031' 2008-11-01 1 4
5 '7500' 2009-07-11 30 0
6 '7500' 2009-10-01 8 0
7 '7500' 2010-01-01 0 0
8 '7041' 2009-06-20 26 0
9 '7041' 2009-08-01 0 0
10 '0277' 2009-01-01 3 0")
Make sure that your "date" variable values are represented by actual dates and not characters. 确保您的“日期”变量值由实际日期而不是字符表示。
x$date <- as.Date(x$date)
Subset: 子集:
x2 <- with(x, x[!(var1 == 0 & var2 == 0), ])
Aggregate: 骨料:
aggregate(date ~ id, x2, max)
# id date
# 1 277 2009-01-01
# 2 7031 2009-02-02
# 3 7041 2009-06-20
# 4 7500 2009-10-01
If you didn't want to create a new object of your subsetted data, you can also use: aggregate(date ~ id, x[!(x$var1 == 0 & x$var2 == 0), ], max)
如果您不想创建子数据集的新对象,也可以使用:
aggregate(date ~ id, x[!(x$var1 == 0 & x$var2 == 0), ], max)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.