简体   繁体   English

不平衡时间序列上的滚动总和

[英]Rolling sum on an unbalanced time series

I have a series of annual incident counts per category, with no rows for years in which the category did not see an incident. 对于每个类别,我都有一系列的年度事件计数,多年来,该类别中未发生任何事件的行都没有。 I would like to add a column that shows, for each year, how many incidents occurred in the previous three years. 我想增加一列,以显示在过去三年中每年发生多少事件。

One way to handle this is to add empty rows for all years with zero incidents, then use rollapply() with a left-aligned four year window, but that would expand my data set more than I want to. 处理此问题的一种方法是为所有年份添加零事件零的空行,然后将rollapply()与左对齐的四年时间窗一起使用,但这会使我的数据集扩展得比我想要的更多。 Surely there's a way to use ddply() and transform for this? 当然有办法使用ddply()进行transform吗?

The following two lines of code build a dummy data set, then execute a simple plyr sum by category: 以下两行代码构建一个虚拟数据集,然后按类别执行一个简单的plyr sum:

dat <- data.frame(
   category=c(rep('A',6), rep('B',6), rep('C',6)), 
   year=rep(c(2000,2001,2004,2005,2009, 2010),3), 
   incidents=rpois(18, 3)
   )

ddply(dat, .(category) , transform, i_per_c=sum(incidents) )

That works, but it only shows a per-category total. 可以,但是只显示每个类别的总数。

I want a total that's year-dependent. 我想要一个总数,取决于年份。

So I try to expand the ddply() call with the function() syntax, like so: 因此,我尝试使用function()语法扩展ddply()调用,如下所示:

ddply(dat, .(category) , transform, 
      function(x) i_per_c=sum(ifelse(x$year >= year - 4 & x$year < year,  x$incidents, 0) )
      )

This just returns the original data frame, unmodified. 这只是返回未经修改的原始数据帧。

I must be missing something in the plyr syntax, but I don't know what it is. 我必须在plyr语法中缺少某些plyr ,但我不知道它是什么。

Thanks, Matt 谢谢,马特

This is sorta ugly, but it works. 这有点丑陋,但可以。 Nested ply calls: 嵌套层调用:

ddply(dat, .(category), 
    function(datc) adply(datc, 1, 
         function(x) data.frame(run_incidents =
                                sum(subset(datc, year>(x$year-2) & year<=x$year)$incidents))))

There might be a slightly cleaner way to do it, and there are definitely ways that execute much faster. 可能有一种更简洁的方法来执行此操作,并且肯定有一些方法可以执行得更快。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM