[英]How to find mean of n consecutive days in each group r
I have a dataframe that contains id(contains duplicate),date(contains duplicate),value. 我有一个包含id(包含重复),date(包含重复),value的数据帧。 the values are recorded for different consecutive days. 记录不同连续日的值。 now what i want is to group the dataframe with id and date(as n consecutive days) and find mean of values. 现在我想要的是将数据帧与id和日期分组(连续n天)并找到值的平均值。 and return NA if the last group does not contain n days. 如果最后一个组不包含n天,则返回NA。
id date value
1 2016-10-5 2
1 2016-10-6 3
1 2016-10-7 1
1 2016-10-8 2
1 2016-10-9 5
2 2013-10-6 2
. . .
. . .
. . .
20 2012-2-6 10
desired output with n-consecutive days as 3 连续n天的期望输出为3
id date value group_n_consecutive_days mean_n_consecutive_days
1 2016-10-5 2 1 2
1 2016-10-6 3 1 2
1 2016-10-7 1 1 2
1 2016-10-8 2 2 NA
1 2016-10-9 5 2 NA
2 2013-10-6 2 1 4
.
.
.
.
20 2012-2-6 10 6 25
The data in the question is sorted and consecutive within id
so we assume that that is the case. 问题中的数据在id
被排序和连续,因此我们假设是这种情况。 Also when the question refers to duplicate dates we assume that that means that different id values can have the same date but within id the dates are unique and consecutive. 此外,当问题引用重复日期时,我们假设这意味着不同的id值可以具有相同的日期但在id内日期是唯一且连续的。 Now, using the data shown reproducibly in Note 2 at the end group by id
and compute the group numbers using gl
. 现在,使用id
在末尾组的注释2中重复显示的数据,并使用gl
计算组号。 Then grouping by id
and group_no
take the mean of each group of 3 or NA for smaller groups. 然后按id
和group_no
进行分组,取每组3或NA的平均值作为较小的组。
library(dplyr)
DF %>%
group_by(id) %>%
mutate(group_no = c(gl(n(), 3, n()))) %>%
group_by(group_no, add = TRUE) %>%
mutate(mean = if (n() == 3) mean(value) else NA) %>%
ungroup
giving: 赠送:
# A tibble: 6 x 5
id date value group_no mean
<int> <date> <int> <int> <dbl>
1 1 2016-10-05 2 1 2
2 1 2016-10-06 3 1 2
3 1 2016-10-07 1 1 2
4 1 2016-10-08 2 2 NA
5 1 2016-10-09 5 2 NA
6 2 2013-10-06 2 1 NA
An alternative to gl(...)
could be cumsum(rep(1:3, length = n()) == 1)
and an alternative to if (n() = 3) mean(value) else NA
could be mean(head(c(value, NA, NA), 3))
. gl(...)
的替代方案可以是cumsum(rep(1:3, length = n()) == 1)
和if (n() = 3) mean(value) else NA
的替代方法if (n() = 3) mean(value) else NA
可以是mean(head(c(value, NA, NA), 3))
。
The input data in reproducible form was assumed to be: 可重复形式的输入数据假定为:
Lines <- "id date value
1 2016-10-5 2
1 2016-10-6 3
1 2016-10-7 1
1 2016-10-8 2
1 2016-10-9 5
2 2013-10-6 2"
DF <- read.table(text = Lines, header = TRUE)
DF$date <- as.Date(DF$date)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.