简体   繁体   English

滚动平均值与不同数量的观察值

[英]Rolling mean with differing number of observations

I'm trying to construct a rolling mean for a dataset over the past 6 months. 我正在尝试在过去6个月中为数据集构建滚动平均值。 The data is on a daily basis and has more than 100.000 rows from which I provided a sample below. 该数据每天都有,并且有100.000多行,我在下面提供了一个示例。

# A tibble: 100 × 5
       ID    MONTH       DATE VALUE   R_MEAN
   <fctr>    <dbl>     <date> <dbl>    <dbl>
1     634 20160200 2016-02-03     2 0.000000
2    1700 20150300 2015-03-02     3 0.000000
3    1700 20150400 2015-04-01     7 3.000000
4    1700 20150400 2015-04-09     1 5.000000
5    1700 20150700 2015-07-02    26 3.666667
6    1700 20150800 2015-08-03     1 9.250000
7    1700 20150900 2015-09-01     2 7.600000
8    1700 20151000 2015-10-01     5 7.400000
9    1700 20151000 2015-10-07    10 7.833333
10   1700 20151100 2015-11-02     8 8.800000
# ... with 90 more rows

My goal is to create a moving average over the past 6 months, so for example for an ID: X and DATE value of 20160101 I want to get the average VALUE of all rows which have the same ID and where the DATE value is between 20150601 and 20160101. When no previous values are available I assume an average value of zero. 我的目标是创建过去6个月的移动平均值,例如,对于一个ID:X和DATE值为20160101,我想获得所有具有相同ID且DATE值介于20150601之间的行的平均值和20160101。当没有以前的值可用时,我假设平均值为零。

I thought of using some sort of expanding grid approach, but as I have a lot of ID's (close to 30.000), expanding the grid on aa daily basis over a period of 2 years would result in an enormous grid. 我曾想过使用某种扩展网格方法,但是由于我有很多ID(接近30.000),因此在2年的时间内每天扩展网格会导致巨大的网格。

Here I use dplyr . 在这里我使用dplyr I inner_join the table on itself, then filter the relevant previous rows, per row in the source data, and calculate the mean value. 我使用inner_join表本身,然后过滤源数据中每行的相关先前行,并计算平均值。

Finally I left_join the original data on the processed data and replace NA using coalesce . 最后,我left_join在处理数据的原始数据和替换NA使用coalesce

The 6 months window is calculated by substracting 182 days from the DATE . 通过减去DATE 182天来计算6个月的时间范围。 You could also use lubridate to make it a period in months. 您也可以使用lubridate将其lubridate几个月。 Personally I prefer to work with a fixed window of days, that does not depend on the different amount of days each month has. 就我个人而言,我更喜欢使用固定的天数,而不取决于每个月的天数。

str <- '
row ID  MONTH DATE  VALUE R_MEAN
1 634 20160200 2016-02-03     2 0.000000
2 1700 20150300 2015-03-02     3 0.000000
3 1700 20150400 2015-04-01     7 3.000000
4 1700 20150400 2015-04-09     1 5.000000
5 1700 20150700 2015-07-02    26 3.666667
6 1700 20150800 2015-08-03     1 9.250000
7 1700 20150900 2015-09-01     2 7.600000
8 1700 20151000 2015-10-01     5 7.400000
9 1700 20151000 2015-10-07    10 7.833333
10  1700 20151100 2015-11-02     8 8.800000
'

file <- textConnection(str)

raw <- read.table(file, header = T)

library(dplyr)

df <- raw %>% mutate(DATE = as.Date(DATE,'%Y-%m-%d'))

prev <- df %>% inner_join(df, by = 'ID') %>%
  filter(DATE.y > DATE.x-182, DATE.y < DATE.x) %>%
  group_by(row.x) %>% summarise(meanVALUE = mean(VALUE.y)) %>%
  rename(row = row.x)

df %>% left_join(prev, by='row') %>% mutate(meanVALUE = coalesce(meanVALUE,0))

result: 结果:

   row   ID    MONTH       DATE VALUE   R_MEAN meanVALUE
1    1  634 20160200 2016-02-03     2 0.000000  0.000000
2    2 1700 20150300 2015-03-02     3 0.000000  0.000000
3    3 1700 20150400 2015-04-01     7 3.000000  3.000000
4    4 1700 20150400 2015-04-09     1 5.000000  5.000000
5    5 1700 20150700 2015-07-02    26 3.666667  3.666667
6    6 1700 20150800 2015-08-03     1 9.250000  9.250000
7    7 1700 20150900 2015-09-01     2 7.600000  8.750000
8    8 1700 20151000 2015-10-01     5 7.400000  7.500000
9    9 1700 20151000 2015-10-07    10 7.833333  7.000000
10  10 1700 20151100 2015-11-02     8 8.800000  8.800000

Maybe this helps: 也许这会有所帮助:

   for (i in 1:levels(df$ID))
     mean(df$value[df$DATE>(Sys.date()-182) & 
                   df$ID==levels(df$ID)[i]],
           na.rm=T)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 r 中按组组合两个具有不同观察次数的数据帧 - Combing two data frames by group with differing number of observations in r 当组的观察值不足时,按组评估滚动平均值 - Evaluate rolling mean by group when groups have insufficient observations 确定观察结果不同的人群 - Identify groups with differing observations Traminer:具有观察次数的平均时间条形图 - Traminer: Mean time barplot with number of observations 当观察数为奇数时,均值()中的修剪参数 - Trim argument in mean() when number of observations is odd 在指定条件下计算缺失观测数的平均值 - Calculating mean with specified condition for number of missing observations R的滚动谐波平均值(n天数) - Rolling Harmonic mean with R (number of n days) scan(file…在convert.inp中读取时出现错误,并且观察次数不同 - scan(file… Error and differing number of observations when reading in convert.inp 当一个组有多个观察值时,将“加权”滚动平均值分组,同时排除自己的组值 - group “weighted” rolling mean while excluding own group value when a group has multiple observations 箱线图 ggplot2:在分组箱线图中显示平均值和观察次数 - Boxplot ggplot2: Show mean value and number of observations in grouped boxplot
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM