滚动平均值与不同数量的观察值

Question

I'm trying to construct a rolling mean for a dataset over the past 6 months. 我正在尝试在过去6个月中为数据集构建滚动平均值。 The data is on a daily basis and has more than 100.000 rows from which I provided a sample below. 该数据每天都有，并且有100.000多行，我在下面提供了一个示例。

# A tibble: 100 × 5
       ID    MONTH       DATE VALUE   R_MEAN
   <fctr>    <dbl>     <date> <dbl>    <dbl>
1     634 20160200 2016-02-03     2 0.000000
2    1700 20150300 2015-03-02     3 0.000000
3    1700 20150400 2015-04-01     7 3.000000
4    1700 20150400 2015-04-09     1 5.000000
5    1700 20150700 2015-07-02    26 3.666667
6    1700 20150800 2015-08-03     1 9.250000
7    1700 20150900 2015-09-01     2 7.600000
8    1700 20151000 2015-10-01     5 7.400000
9    1700 20151000 2015-10-07    10 7.833333
10   1700 20151100 2015-11-02     8 8.800000
# ... with 90 more rows

My goal is to create a moving average over the past 6 months, so for example for an ID: X and DATE value of 20160101 I want to get the average VALUE of all rows which have the same ID and where the DATE value is between 20150601 and 20160101. When no previous values are available I assume an average value of zero. 我的目标是创建过去6个月的移动平均值，例如，对于一个ID：X和DATE值为20160101，我想获得所有具有相同ID且DATE值介于20150601之间的行的平均值和20160101。当没有以前的值可用时，我假设平均值为零。

I thought of using some sort of expanding grid approach, but as I have a lot of ID's (close to 30.000), expanding the grid on aa daily basis over a period of 2 years would result in an enormous grid. 我曾想过使用某种扩展网格方法，但是由于我有很多ID（接近30.000），因此在2年的时间内每天扩展网格会导致巨大的网格。

Answer 1

Here I use dplyr . 在这里我使用dplyr 。 I inner_join the table on itself, then filter the relevant previous rows, per row in the source data, and calculate the mean value. 我使用inner_join表本身，然后过滤源数据中每行的相关先前行，并计算平均值。

Finally I left_join the original data on the processed data and replace NA using coalesce . 最后，我left_join在处理数据的原始数据和替换NA使用coalesce 。

The 6 months window is calculated by substracting 182 days from the DATE . 通过减去DATE 182天来计算6个月的时间范围。 You could also use lubridate to make it a period in months. 您也可以使用lubridate将其lubridate几个月。 Personally I prefer to work with a fixed window of days, that does not depend on the different amount of days each month has. 就我个人而言，我更喜欢使用固定的天数，而不取决于每个月的天数。

str <- '
row ID  MONTH DATE  VALUE R_MEAN
1 634 20160200 2016-02-03     2 0.000000
2 1700 20150300 2015-03-02     3 0.000000
3 1700 20150400 2015-04-01     7 3.000000
4 1700 20150400 2015-04-09     1 5.000000
5 1700 20150700 2015-07-02    26 3.666667
6 1700 20150800 2015-08-03     1 9.250000
7 1700 20150900 2015-09-01     2 7.600000
8 1700 20151000 2015-10-01     5 7.400000
9 1700 20151000 2015-10-07    10 7.833333
10  1700 20151100 2015-11-02     8 8.800000
'

file <- textConnection(str)

raw <- read.table(file, header = T)

library(dplyr)

df <- raw %>% mutate(DATE = as.Date(DATE,'%Y-%m-%d'))

prev <- df %>% inner_join(df, by = 'ID') %>%
  filter(DATE.y > DATE.x-182, DATE.y < DATE.x) %>%
  group_by(row.x) %>% summarise(meanVALUE = mean(VALUE.y)) %>%
  rename(row = row.x)

df %>% left_join(prev, by='row') %>% mutate(meanVALUE = coalesce(meanVALUE,0))

result: 结果：

   row   ID    MONTH       DATE VALUE   R_MEAN meanVALUE
1    1  634 20160200 2016-02-03     2 0.000000  0.000000
2    2 1700 20150300 2015-03-02     3 0.000000  0.000000
3    3 1700 20150400 2015-04-01     7 3.000000  3.000000
4    4 1700 20150400 2015-04-09     1 5.000000  5.000000
5    5 1700 20150700 2015-07-02    26 3.666667  3.666667
6    6 1700 20150800 2015-08-03     1 9.250000  9.250000
7    7 1700 20150900 2015-09-01     2 7.600000  8.750000
8    8 1700 20151000 2015-10-01     5 7.400000  7.500000
9    9 1700 20151000 2015-10-07    10 7.833333  7.000000
10  10 1700 20151100 2015-11-02     8 8.800000  8.800000

Answer 2

Maybe this helps: 也许这会有所帮助：

   for (i in 1:levels(df$ID))
     mean(df$value[df$DATE>(Sys.date()-182) & 
                   df$ID==levels(df$ID)[i]],
           na.rm=T)

滚动平均值与不同数量的观察值

问题描述

2 个解决方案

解决方案1
2 已采纳 2017-02-10 09:13:11

解决方案2
0 2017-02-10 08:47:44

滚动平均值与不同数量的观察值

问题描述

2 个解决方案

解决方案1 2 已采纳 2017-02-10 09:13:11

解决方案2 0 2017-02-10 08:47:44

解决方案1
2 已采纳 2017-02-10 09:13:11

解决方案2
0 2017-02-10 08:47:44