简体   繁体   中英

apply function to rolling window in panel data in R

I'm trying to apply a function (say standard deviation) in a rolling window, by category:

I have the following data:

cat = c("A", "A", "A", "A", "B", "B", "B", "B") 
year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993) 
value = c(2, 3, 5, 6, 8, 9, 4, 5) 
df = data.frame(cat, year, value)

I would like to create a new column (say sd) that estimates the standard deviation over two year window by cat.

Here's the result I'm thinking of:

在此处输入图片说明

Any advice on how to achieve this?

It can be done by using rollapply from the zoo package:

library(zoo)

cat = c("A", "A", "A", "A", "B", "B", "B", "B") 
year = c(1990, 1991, 1992, 1993, 1990, 1991, 1992, 1993) 
value = c(2, 3, 5, 6, 8, 9, 4, 5) 
df = data.frame(cat, year, value)

df$stdev <- unlist(by(df, df$cat, function(x) {
  c(NA, rollapply(x$value, width=2, sd))
}), use.names=FALSE)

print(df)
##   cat year value     stdev
## 1   A 1990     2        NA
## 2   A 1991     3 0.7071068
## 3   A 1992     5 1.4142136
## 4   A 1993     6 0.7071068
## 5   B 1990     8        NA
## 6   B 1991     9 0.7071068
## 7   B 1992     4 3.5355339
## 8   B 1993     5 0.7071068

You can also do it with ddply if you'd rather use plyr functions than by :

df$stdev <- ddply(df, .(cat), summarise, 
                  stdev=c(NA, rollapply(value, width=2, sd)))$stdev

As a lark, I did a system.time (multiple times) comparison of the above two methods and also the ave method pointed out by @thelatemail in the comment thread below this answer (starting with a "fresh" copy of the data frame).

df <- data.frame(cat, year, value)
system.time(df$stdev <- with(df, ave(value, cat, FUN=function(x) c(NA, rollapply(x, width=2, sd)))))

df <- data.frame(cat, year, value)
system.time(df$stdev <- unlist(by(df, df$cat, function(x) c(NA, rollapply(x$value, width=2, sd))), use.names=FALSE))

df <- data.frame(cat, year, value)
system.time(df$stdev <- ddply(df, .(cat), summarise, stdev=c(NA, rollapply(value, width=2, sd)))$stdev)

Both the ave and by methods take:

   user  system elapsed 
  0.002   0.000   0.002 

and the ddply version takes:

   user  system elapsed 
  0.004   0.000   0.004 

Not that speed is really an issue here, but it looks like the ave and by versions are the most efficient ways to do this.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM