简体   繁体   English

用 Plyr 滚动平均

[英]Rolling Mean with Plyr

I am trying to calculate a rolling mean using plyr.我正在尝试使用 plyr 计算滚动平均值。 The data is at the industry-country-year, with repeated observations for each industry-country.数据以行业国家年为单位,对每个行业国家进行重复观察。 The data is unbalanced, but most industry-countries have approximately 15 observations.数据不平衡,但大多数工业国家大约有 15 个观察值。

For example the data looks like this:例如,数据如下所示:

country       ISIC      Year      Value
Algeria        1        1990       400
Algeria        1        1991       450
Algeria        1        1992       460
Algeria        2        1990       450
Algeria        2        1991       500
Algeria        2        1992       450
Argentina      1        1990       400
Argentina      1        1991       450
Argentina      1        1992       460
Argentina      2        1990       450
Argentina      2        1991       500
Argentina      2        1992       450
.              .        .          .
.              .        .          .

If I subset the data to a specific industry and country I am able to calculate the rolling mean like this如果我将数据子集化到一个特定的行业和国家,我就可以像这样计算滚动平均值

rollmean(subdata$Value, 3)

However, I've been unable to get it to work with plyr, so as to calculate the rolling mean for each industry-country group.但是,我一直无法让它与 plyr 一起工作,以便计算每个行业国家组的滚动平均值。 I've tried:我试过了:

roll <- ddply(data, .(country, ISIC), summarize, rollmean(data$Value, 3))

a rolling mean necessarily shortens the data which part of why you get the error.滚动均值必然会缩短数据,这是您出错的部分原因。

ddply(dat, .(country, ISIC), function(df) data.frame(country=unique(df$country),                  
                                                     ISIC=unique(df$ISIC),
                                                     rolled=rollmean(df$Value, 3)))
    country ISIC   rolled
1   Algeria    1 436.6667
2   Algeria    2 466.6667
3 Argentina    1 436.6667
4 Argentina    2 466.6667

However, if you're doing a rolling mean on 3 samples and your data only has 3 samples, you're just calculating the mean:但是,如果您对 3 个样本进行滚动平均,而您的数据只有 3 个样本,则您只是在计算平均数:

ddply(dat, .(country, ISIC), summarise, mean(Value))

    country ISIC      ..1
1   Algeria    1 436.6667
2   Algeria    2 466.6667
3 Argentina    1 436.6667
4 Argentina    2 466.6667

UPDATED FOR COMMENTS:更新评论:

To return the dates you can use the na.pad argument to rollmean :要返回日期,您可以将na.pad参数用于rollmean

ddply(dat, .(country, ISIC), function(df) {df$rolled <- rollmean(df$Value, 3, na.pad=TRUE); return(df)})

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM