R：聚合不规则长度的时间序列组

Question

I think this is a split-apply-combine problem, but with a time series twist. 我认为这是一个拆分合并问题，但存在时间序列扭曲。 My data consists of irregular counts and I need to perform some summary statistics on each group of counts. 我的数据由不规则计数组成，我需要对每组计数进行一些汇总统计。 Here's a snapshot of the data: 这是数据的快照：

在此处输入图片说明

And here's it is for your console: 这是用于您的控制台的：

library(xts)

date <- as.Date(c("2010-11-18", "2010-11-19", "2010-11-26", "2010-12-03", "2010-12-10",
              "2010-12-17", "2010-12-24", "2010-12-31", "2011-01-07", "2011-01-14",
              "2011-01-21", "2011-01-28", "2011-02-04", "2011-02-11", "2011-02-18",
              "2011-02-25", "2011-03-04", "2011-03-11", "2011-03-18", "2011-03-25",
              "2011-03-26", "2011-03-27"))

returns <- c(0.002,0.000,-0.009,0.030, 0.013,0.003,0.010,0.001,0.011,0.017,
         -0.008,-0.005,0.027,0.014,0.010,-0.017,0.001,-0.013,0.027,-0.019,
         0.000,0.001)
count <- c(NA,NA,1,1,2,2,3,4,5,6,7,7,7,7,7,NA,NA,NA,1,2,NA,NA)
maxCount <- c(NA,NA,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,
          0.030,0.030,0.030,0.030,NA,NA,NA,0.027,0.027,NA,NA)
sumCount <- c(NA,NA,0.000,0.030,0.042,0.045,0.056,0.056,0.067,0.084,0.077,
          0.071,0.098,0.112,0.123,NA,NA,NA,0.000,-0.019,NA,NA)

xtsData <- xts(cbind(returns,count,maxCount,sumCount),date)

I have no idea how to construct the max and cumSum columns, especially since each count series is of an irregular length. 我不知道如何构造max和cumSum列，尤其是因为每个计数序列的长度都是不规则的。 Since I won't always know the start and end points of a count series, I'm lost at trying to figure out the index of these groups. 由于我并不总是知道计数系列的起点和终点，因此我迷失在试图找出这些组的索引的时候。 Thanks for your help! 谢谢你的帮助！

UPDATE: here is my for loop for attempting to calculating cumSum. 更新：这是我的for循环，用于尝试计算cumSum。 it's not the cumulative sum, just the returns necessary, i'm still unsure how to apply functions to these ranges! 这不是累积的总和，只是必要的回报，我仍然不确定如何将函数应用于这些范围！

xtsData <- cbind(xtsData,mySumCount=NA)
# find groups of returns
for(i in 1:nrow(xtsData)){
  if(is.na(xtsData[i,"count"]) == FALSE){
    xtsData[i,"mySumCount"] <- xtsData[i,"returns"]
  }
  else{
   xtsData[i,"mySumCount"] <- NA
  }
}

UPDATE 2: thank you commenters! 更新2：谢谢评论者！

# report returns when not NA count
x1 <- xtsData[!is.na(xtsData$count),"returns"]

# cum sum is close, but still need to exclude the first element
# -0.009 in the first series of counts and .027 in the second series of counts
x2 <- cumsum(xtsData[!is.na(xtsData$count),"returns"]) 

# this is output is not accurate because .03 is being displayed down the entire column, not just during periods when counts != NA. is this just a rounding error?
x3 <- max(xtsData[!is.na(xtsData$count),"returns"])

在此处输入图片说明

SOLUTION: 解：

# function to pad a vector with a 0
lagpad <- function(x, k) {
  c(rep(0, k), x)[1 : length(x)] 
}

# group the counts
x1 <- na.omit(transform(xtsData, g =  cumsum(c(0, diff(!is.na(count)) == 1))))

# cumulative sum of the count series
z1 <- transform(x1, cumsumRet = ave(returns, g, FUN =function(x) cumsum(replace(x, 1, 0))))
# max of the count series
z2 <- transform(x1, maxRet = ave(returns, g, FUN =function(x) max(lagpad(x,1))))



 merge(xtsData,z1$cumsumRet,z2$maxRet)

在此处输入图片说明

Answer 1

The code shown is not consistent with the output in the image and there is no explanation provided so its not clear what manipulations were wanted; 显示的代码与图像中的输出不一致，并且没有提供解释，因此不清楚所需要的操作。 however, the question did mention that the main problem is distinguishing the groups so we will address that. 但是，问题确实提到主要的问题是区分群体，因此我们将解决这个问题。

To do that we compute a new column g whose rows contain 1 for the first group, 2 for the second and so on. 为此，我们计算一个新列g其第一个组的行包含1，第二个组的行包含2，依此类推。 We also remove the NA rows since the g column is sufficient to distinguish groups. 我们也删除了NA行，因为g列足以区分组。

The following code computes a vector the same length as count by first setting each NA position to FALSE and each non-NA position to TRUE. 下面的代码通过首先将每个NA位置设置为FALSE，将每个非NA位置设置为TRUE，计算与count长度相同的向量。 It then differences each position of that vector with the prior position. 然后，它使该向量的每个位置与先前的位置不同。 To do that it implicitly converts FALSE to 0 and TRUE to 1 and then performs the differencing. 为此，它将隐式将FALSE转换为0，将TRUE转换为1，然后执行差分。 Next we convert this last result to a logical vector which is TRUE for each 1 component and FALSE otherwise. 接下来，我们将最后的结果转换为逻辑矢量，该逻辑矢量对每个1组件均为TRUE，否则为FALSE。 Since the first component of the vector that is differenced has no prior position we prepend 0 for that. 由于向量的第一个被差分的分量没有在先位置，因此我们为此加0。 The prepending operation implicitly converts the TRUE and FALSE values just generated to 1 and 0 respectively. 前置操作隐式地将刚生成的TRUE和FALSE值分别转换为1和0。 Taking the cumsum fills in the first group with 1, the second with 2 and so on. 取cumsum在第一个组中填充1，在第二个组中填充2，依此类推。 Finally omit the NA rows: 最后省略NA行：

x <- na.omit(transform(x, g =  cumsum(c(0, diff(!is.na(count)) == 1))))

giving: 给予：

> x
           returns count maxCount sumCount g
2010-11-26  -0.009     1    0.030    0.000 1
2010-12-03   0.030     1    0.030    0.030 1
2010-12-10   0.013     2    0.030    0.042 1
2010-12-17   0.003     2    0.030    0.045 1
2010-12-24   0.010     3    0.030    0.056 1
2010-12-31   0.001     4    0.030    0.056 1
2011-01-07   0.011     5    0.030    0.067 1
2011-01-14   0.017     6    0.030    0.084 1
2011-01-21  -0.008     7    0.030    0.077 1
2011-01-28  -0.005     7    0.030    0.071 1
2011-02-04   0.027     7    0.030    0.098 1
2011-02-11   0.014     7    0.030    0.112 1
2011-02-18   0.010     7    0.030    0.123 1
2011-03-18   0.027     1    0.027    0.000 2
2011-03-25  -0.019     2    0.027   -0.019 2
attr(,"na.action")
2010-11-18 2010-11-19 2011-02-25 2011-03-04 2011-03-11 2011-03-26 2011-03-27 
         1          2         16         17         18         21         22 
attr(,"class")
[1] "omit"

You can now use ave to perform any calculations you like. 现在，您可以使用ave执行所需的任何计算。 For example to take cumulative sums of returns by group: 例如，按组取累计收益之和：

transform(x, cumsumRet = ave(returns, g, FUN = cumsum))

Replace cumsum with any other function that is suitable for use with ave . 用适用于ave任何其他功能替换cumsum 。

Answer 2

Ah, so "count" are the groups and you want the cumsum per group and the max per group. 嗯，所以“计数”是组，您需要每个组的总和和每个组的最大值。 I think in data.table, so here is how I would do it. 我认为在data.table中，所以这是我的方法。

library(xts)
library(data.table)

date <- as.Date(c("2010-11-18", "2010-11-19", "2010-11-26", "2010-12-03", "2010-12-10",
                  "2010-12-17", "2010-12-24", "2010-12-31", "2011-01-07", "2011-01-14",
                  "2011-01-21", "2011-01-28", "2011-02-04", "2011-02-11", "2011-02-18",
                  "2011-02-25", "2011-03-04", "2011-03-11", "2011-03-18", "2011-03-25",
                  "2011-03-26", "2011-03-27"))

returns <- c(0.002,0.000,-0.009,0.030, 0.013,0.003,0.010,0.001,0.011,0.017,
             -0.008,-0.005,0.027,0.014,0.010,-0.017,0.001,-0.013,0.027,-0.019,
             0.000,0.001)
count <- c(NA,NA,1,1,2,2,3,4,5,6,7,7,7,7,7,NA,NA,NA,1,2,NA,NA)
maxCount <- c(NA,NA,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,0.030,
              0.030,0.030,0.030,0.030,NA,NA,NA,0.027,0.027,NA,NA)
sumCount <- c(NA,NA,0.000,0.030,0.042,0.045,0.056,0.056,0.067,0.084,0.077,
              0.071,0.098,0.112,0.123,NA,NA,NA,0.000,-0.019,NA,NA)

DT<-data.table(date,returns,count)]
DT[!is.na(count),max:=max(returns),by=count]
DT[!is.na(count),cumSum:= cumsum(returns),by=count]

#if you need an xts object at the end, then.

xtsData <- xts(cbind(DT$returns,DT$count, DT$max,DT$cumSum),DT$date)

R：聚合不规则长度的时间序列组

问题描述

2 个解决方案

解决方案1
3 已采纳 2014-08-17 20:08:02

解决方案2
1 2014-08-17 20:06:54

R：聚合不规则长度的时间序列组

问题描述

2 个解决方案

解决方案1 3 已采纳 2014-08-17 20:08:02

解决方案2 1 2014-08-17 20:06:54

解决方案1
3 已采纳 2014-08-17 20:08:02

解决方案2
1 2014-08-17 20:06:54