简体   繁体   English

R时间序列-识别缺失的观测值(时间戳)并插入NA以创建给定长度的时间序列

[英]R timeseries - identify missing observations (timestamps) and insert NAs to create time series of given length

I have a set of 24 grouped (hierarchical) time series supposedly running over 3 years, and I want to look at monthly sales, but it turns out that a number of them have missing observations, eg 我有一组24个按时间分组的(分层的)时间序列,它们大概运行了3年,我想查看每月的销售额,但事实证明,其中有许多观测值缺失,例如

getCounts(Shop1, ...)
2011-01 2011-02 2011-03 2011-04 2011-05 2011-06 2011-07 2011-08 2011-09 2011-10 2011-11 2011-12 2012-02 2012-03 2012-04 2012-05 2012-06 2012-07 2012-08 2012-09 2012-10 2012-11 
 10      22      10      12      36      31      25      19       7       7       7       5       1       9       9      11      10      16      25       3       2       5 

is missing an observation for January 2012 and ends in November 2012 although it's supposed to run to December 2013. 尽管应该运行到2013年12月,但仍缺少2012年1月的观测值,并于2012年11月结束。

getCounts uses the command getCounts使用命令

with(myDF, tapply(varName, substr(dateName, 1, 7), sum))

to get the monthly counts. 获取月度计数。

I want to replace the missing observations, both in the middle of the time series and at the end, with NAs, so that all my time series have the same number of observations and, if there are any "holes" they will be visible in a plot. 我想用NA替换时间序列中间和末尾缺失的观测值,以便我所有的时间序列具有相同数量的观测值,并且如果有任何“空洞”,它们将在剧情。

Can anybody help me do this? 有人可以帮我吗?

Thanks! 谢谢!

Edit: My preferred output would be something like this: 编辑:我的首选输出将是这样的:

      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2011   1  NA   2   3   4   5   6  NA   7   8   9  10
2012   2   3   4   5   6  NA  NA  NA  NA  NA  NA  NA

where each NA is replacing a missing observation. 每个NA都将替换缺失的观测值。

Edit 2: getCounts() look like this: 编辑2:getCounts()看起来像这样:

getCounts <- function(dataObject, dateName, varName){ 
dataNameString <- deparse(substitute(dataObject))   
countsStr <- paste0("with(", dataNameString,", tapply(", varName, ", substr(", dateName, ", 1, 7), sum))")
counts <- eval(parse(text = countsStr))
return(counts)
}

And here's the dput: 这是Dput:

structure(c(10, 22, 10, 12, 36, 31, 25, 19, 7, 7, 7, 5, 1, 9, 
9, 11, 10, 16, 25, 3, 2, 5), .Dim = 22L, .Dimnames = list(c("2011-01", 
"2011-02", "2011-03", "2011-04", "2011-05", "2011-06", "2011-07", 
"2011-08", "2011-09", "2011-10", "2011-11", "2011-12", "2012-02", 
"2012-03", "2012-04", "2012-05", "2012-06", "2012-07", "2012-08", 
"2012-09", "2012-10", "2012-11")))

Try this 尝试这个

df <- data.frame(Year = substr(names(x), 1, 4),
                 Month = factor(month.abb[as.numeric(substr(names(x), 6, 7))], 
                         levels = month.abb),
                 Value = x)

library(tidyr)
spread(df, Month, Value)
#   Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
# 1 2011  10  22  10  12  36  31  25  19   7   7   7   5
# 2 2012  NA   1   9   9  11  10  16  25   3   2   5  NA

Data 数据

x <- structure(c(10, 22, 10, 12, 36, 31, 25, 19, 7, 7, 7, 5, 1, 9, 
                 9, 11, 10, 16, 25, 3, 2, 5), .Dim = 22L, .Dimnames = list(c("2011-01", 
                 "2011-02", "2011-03", "2011-04", "2011-05", "2011-06", "2011-07", 
                 "2011-08", "2011-09", "2011-10", "2011-11", "2011-12", "2012-02", 
                 "2012-03", "2012-04", "2012-05", "2012-06", "2012-07", "2012-08", 
                 "2012-09", "2012-10", "2012-11")))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM