简体   繁体   English

通过索引将滚动均值应用于数据库

[英]apply a rolling mean to a database by an index

I would like to calculate a rolling mean on data in a single data frame by multiple ids. 我想通过多个id计算单个数据帧中数据的滚动均值。 See my example dataset below. 请参阅下面的示例数据集。

date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04", 
          "2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",  
          "2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02", 
          "2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06", 
          "2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
           "b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)

df <- data.frame(date, index, x, y, z)

I would like to calculate the rolling mean for x, y and z, by a and then by b. 我想用a计算x,y和z的滚动均值,然后用b计算。

I tried the following, but I am getting an error. 我尝试了以下,但我收到了一个错误。

test <- tapply(df, df$index, FUN = rollmean(df, 5, fill=NA))

The error: 错误:

Error in xu[k:n] - xu[c(1, seq_len(n - k))] : 
  non-numeric argument to binary operator

It seems like there is an issue with the fact that index is a character, but I need it in order to calculate the means... 看起来像索引是一个字符的问题,但我需要它来计算方法......

1) ave Try ave rather than tapply and make sure it is applied only over the columns of interest, ie columns 3, 4, 5. 1)ave尝试ave而不是tapply并确保它仅应用于感兴趣的列,即第3,4,5列。

roll <- function(x) rollmean(x, 5, fill = NA)
cbind(df[1:2], lapply(df[3:5], function(x) ave(x, df$index, FUN = roll)))

giving: 赠送:

         date index        x         y        z
1  2015-02-01     a       NA        NA       NA
2  2015-02-02     a       NA        NA       NA
3  2015-02-03     a 66.50522 127.45650 129.8472
4  2015-02-04     a 61.71320 123.83633 129.7673
5  2015-02-05     a 56.56125 120.86158 126.1371
6  2015-02-06     a 66.13340 119.93428 127.1819
7  2015-02-07     a 59.56807 105.83208 125.1244
8  2015-02-08     a 49.98779  95.66024 139.2321
9  2015-02-09     a       NA        NA       NA
10 2015-02-10     a       NA        NA       NA
11 2015-02-01     b       NA        NA       NA
12 2015-02-02     b       NA        NA       NA
13 2015-02-03     b 55.71327 117.52219 139.3961
14 2015-02-04     b 54.58450 107.81763 142.6101
15 2015-02-05     b 50.48102 104.94084 136.3167
16 2015-02-06     b 37.89790  95.45489 135.4044
17 2015-02-07     b 33.05259  85.90916 150.8673
18 2015-02-08     b 49.91385  90.04940 147.1376
19 2015-02-09     b       NA        NA       NA
20 2015-02-10     b       NA        NA       NA

2) by Another way is to use by . 2)通过另一种方法是使用by roll2 handles one group, by applies it to each group producing a by list and do.call("rbind", ...) puts it back together. roll2处理一个组, by将它应用于生成by列表的每个组, do.call("rbind", ...)将它重新组合在一起。

roll2 <- function(x) cbind(x[1:2], rollmean(x[3:5], 5, fill = NA))
do.call("rbind", by(df, df$index, roll2))

giving: 赠送:

           date index        x         y        z
a.1  2015-02-01     a       NA        NA       NA
a.2  2015-02-02     a       NA        NA       NA
a.3  2015-02-03     a 66.50522 127.45650 129.8472
a.4  2015-02-04     a 61.71320 123.83633 129.7673
a.5  2015-02-05     a 56.56125 120.86158 126.1371
a.6  2015-02-06     a 66.13340 119.93428 127.1819
a.7  2015-02-07     a 59.56807 105.83208 125.1244
a.8  2015-02-08     a 49.98779  95.66024 139.2321
a.9  2015-02-09     a       NA        NA       NA
a.10 2015-02-10     a       NA        NA       NA
b.11 2015-02-01     b       NA        NA       NA
b.12 2015-02-02     b       NA        NA       NA
b.13 2015-02-03     b 55.71327 117.52219 139.3961
b.14 2015-02-04     b 54.58450 107.81763 142.6101
b.15 2015-02-05     b 50.48102 104.94084 136.3167
b.16 2015-02-06     b 37.89790  95.45489 135.4044
b.17 2015-02-07     b 33.05259  85.90916 150.8673
b.18 2015-02-08     b 49.91385  90.04940 147.1376
b.19 2015-02-09     b       NA        NA       NA
b.20 2015-02-10     b       NA        NA       NA

3) wide form Another approach is to convert df from long form to wide form in which case a plain rollmean will do it. 3)宽泛的形式另一种方法是将df从长形式转换为宽形式,在这种情况下,普通的rollmean将会这样做。

rollmean(read.zoo(df, split = 2), 5, fill = NA)

giving: 赠送:

                x.a       y.a      z.a      x.b       y.b      z.b
2015-02-01       NA        NA       NA       NA        NA       NA
2015-02-02       NA        NA       NA       NA        NA       NA
2015-02-03 66.50522 127.45650 129.8472 55.71327 117.52219 139.3961
2015-02-04 61.71320 123.83633 129.7673 54.58450 107.81763 142.6101
2015-02-05 56.56125 120.86158 126.1371 50.48102 104.94084 136.3167
2015-02-06 66.13340 119.93428 127.1819 37.89790  95.45489 135.4044
2015-02-07 59.56807 105.83208 125.1244 33.05259  85.90916 150.8673
2015-02-08 49.98779  95.66024 139.2321 49.91385  90.04940 147.1376
2015-02-09       NA        NA       NA       NA        NA       NA
2015-02-10       NA        NA       NA       NA        NA       NA

This works because the dates are the same for both groups. 这是有效的,因为两个组的日期相同。 If the dates were different then it could introduce NAs and rollmean cannot handle those. 如果日期不同,那么它可能会引入rollmeanrollmean无法处理这些。 In that case use 在那种情况下使用

rollapply(read.zoo(df, split = 2), 5, mean, fill = NA)

Note: Since the input uses random numbers in its definition to make it reproducible we must issue set.seed first. 注意:由于输入在其定义中使用随机数使其可重现,因此我们必须首先发出set.seed We used this: 我们用过这个:

set.seed(123)
date <- as.Date(c("2015-02-01", "2015-02-02", "2015-02-03", "2015-02-04", 
          "2015-02-05", "2015-02-06", "2015-02-07", "2015-02-08",  
          "2015-02-09", "2015-02-10", "2015-02-01", "2015-02-02", 
          "2015-02-03", "2015-02-04", "2015-02-05", "2015-02-06", 
          "2015-02-07", "2015-02-08", "2015-02-09", "2015-02-10"))
index <- c("a","a","a","a","a","a","a","a","a","a",
           "b","b","b","b","b","b","b","b","b","b")
x <- runif(20,1,100)
y <- runif(20,50,150)
z <- runif(20,100,200)

This ought to do the trick using the library dplyr and zoo : 这应该是使用库dplyrzoo的技巧:

library(dplyr)
library(zoo)

df %>% 
  group_by(index) %>% 
  mutate(x_mean = rollmean(x, 5, fill = NA),
         y_mean = rollmean(y, 5, fill = NA),
         z_mean = rollmean(z, 5, fill = NA))

You could probably tidy this up more using mutate_each or some other form of mutate . 您可以使用mutate_each或其他形式的mutate来整理更多内容。

You can also change the arguments within rollmean to fit your needs, such as align = "right" or na.pad = TRUE 您还可以更改rollmean的参数以满足您的需要,例如align = "right"na.pad = TRUE

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM