简体   繁体   English

如何在每组中找到连续n天的平均值r

[英]How to find mean of n consecutive days in each group r

I have a dataframe that contains id(contains duplicate),date(contains duplicate),value. 我有一个包含id(包含重复),date(包含重复),value的数据帧。 the values are recorded for different consecutive days. 记录不同连续日的值。 now what i want is to group the dataframe with id and date(as n consecutive days) and find mean of values. 现在我想要的是将数据帧与id和日期分组(连续n天)并找到值的平均值。 and return NA if the last group does not contain n days. 如果最后一个组不包含n天,则返回NA。

id  date          value
 1  2016-10-5       2
 1  2016-10-6       3
 1  2016-10-7       1
 1  2016-10-8       2
 1  2016-10-9       5
 2  2013-10-6       2
 .  .               .
 .  .               .
 .  .               .
 20 2012-2-6        10

desired output with n-consecutive days as 3 连续n天的期望输出为3

  id  date      value  group_n_consecutive_days     mean_n_consecutive_days
   1  2016-10-5  2         1                        2
   1  2016-10-6  3         1                        2
   1  2016-10-7  1         1                        2
   1  2016-10-8  2         2                        NA
   1  2016-10-9  5         2                        NA
   2  2013-10-6  2         1                        4
   .
   .
   .
   .
    20 2012-2-6  10         6                       25         

The data in the question is sorted and consecutive within id so we assume that that is the case. 问题中的数据在id被排序和连续,因此我们假设是这种情况。 Also when the question refers to duplicate dates we assume that that means that different id values can have the same date but within id the dates are unique and consecutive. 此外,当问题引用重复日期时,我们假设这意味着不同的id值可以具有相同的日期但在id内日期是唯一且连续的。 Now, using the data shown reproducibly in Note 2 at the end group by id and compute the group numbers using gl . 现在,使用id在末尾组的注释2中重复显示的数据,并使用gl计算组号。 Then grouping by id and group_no take the mean of each group of 3 or NA for smaller groups. 然后按idgroup_no进行分组,取每组3或NA的平均值作为较小的组。

library(dplyr)

DF %>% 
  group_by(id) %>%
  mutate(group_no = c(gl(n(), 3, n()))) %>%
  group_by(group_no, add = TRUE) %>%
  mutate(mean = if (n() == 3) mean(value) else NA) %>%
  ungroup

giving: 赠送:

# A tibble: 6 x 5
     id date       value group_no  mean
  <int> <date>     <int>    <int> <dbl>
1     1 2016-10-05     2        1     2
2     1 2016-10-06     3        1     2
3     1 2016-10-07     1        1     2
4     1 2016-10-08     2        2    NA
5     1 2016-10-09     5        2    NA
6     2 2013-10-06     2        1    NA

Note 1 注1

An alternative to gl(...) could be cumsum(rep(1:3, length = n()) == 1) and an alternative to if (n() = 3) mean(value) else NA could be mean(head(c(value, NA, NA), 3)) . gl(...)的替代方案可以是cumsum(rep(1:3, length = n()) == 1)if (n() = 3) mean(value) else NA的替代方法if (n() = 3) mean(value) else NA可以是mean(head(c(value, NA, NA), 3))

Note 2 笔记2

The input data in reproducible form was assumed to be: 可重复形式的输入数据假定为:

Lines <- "id  date          value
 1  2016-10-5       2
 1  2016-10-6       3
 1  2016-10-7       1
 1  2016-10-8       2
 1  2016-10-9       5
 2  2013-10-6       2"
DF <- read.table(text = Lines, header = TRUE)
DF$date <- as.Date(DF$date)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM