简体   繁体   English

R中的时间序列层次结构/分组

[英]Time-series hierarchy/grouping in R

I have a question about hierarchical grouping of time-series in R. I currently have this matrix: 我对R中的时间序列的分层分组有疑问。我目前有以下矩阵:

           A      B     C      F     G      H      I
[1,] -33.697  8.610 42.31 17.465 24.84 14.210 10.632
[2,]  -4.698 15.993 20.69  6.222 14.47  3.423 11.047
[3,] -37.458  9.687 47.14 14.659 32.49 12.759 19.726
[4,] -23.851 16.517 40.37 14.392 25.98  9.438 16.538
[5,]   3.329 15.629 12.30  3.449  8.85  2.635  6.215
[6,] -38.071  5.746 43.82 15.932 27.89 14.113 13.772

Just by inspection, I can figure out that: 仅通过检查,我就能发现:

  • G = H + I G = H +我
  • C = F + G C = F + G
  • A = B - C A = B-C

Is there a way that I can find these sum relationships (positive and negative) automatically on large time-series in R? 有没有一种方法可以在R中的较大时间序列上自动找到这些和关系(正和负)? I have tried using an lm() to figure out the relationships but that is too time consuming to do on every series. 我尝试使用lm()来找出关系,但这对于每个系列来说都太耗时。 Not to mention many times there are collinearity problems. 更不用说很多时候存在共线性问题。

Many Thanks! 非常感谢!

structure(list(A = c(-33.6970557915047, -4.69841752527282, -37.457728596637, 
-23.8508993089199, 3.32904924079776, -38.0712462896481), B = c(8.60984595282935, 
15.9929901333526, 9.68719404516742, 16.5167794595473, 15.6285679822322, 
5.74573907931335), C = c(42.306901744334, 20.6914076586254, 47.1449226418044, 
40.3676787684672, 12.2995187414344, 43.8169853689615), F = c(17.4649945173878, 
6.22195235290565, 14.6593122615013, 14.3921482057776, 3.44929573708214, 
15.9315551938489), G = c(24.8419072269462, 14.4694553057197, 
32.4856103803031, 25.9755305626895, 8.8502230043523, 27.8854301751126
), H = c(14.2098777298816, 3.42268325854093, 12.7592747195158, 
9.43778987810947, 2.63517117220908, 14.1129822209477), I = c(10.6320294970647, 
11.0467720471788, 19.7263356607873, 16.5377406845801, 6.21505183214322, 
13.7724479541648)), .Names = c("A", "B", "C", "F", "G", "H", 
"I"), row.names = c(NA, -6L), class = "data.frame")

This also uses regression but it 这也使用回归,但是

  • uses lm.fit which is faster than lm . 使用比lm快的lm.fit (There also exists fastLm in rcppArmadillo and rcppEigen that you could try as well.) (您也可以尝试在rcppArmadillo和rcppEigen中也存在fastLm 。)

  • avoids duplicating regressions by using only unique combinations. 避免仅使用唯一组合来重复回归。

  • assumes that only triples need to be investigated cutting down the amount of computation (since that seems the case in the post) 假设只需要调查三元组,从而减少了计算量(因为帖子中似乎是这种情况)

  • assumes all coefficients are integer to clean up the output 假设所有系数都是整数以清理输出

The code is: 代码是:

eps <- .1
combos <- combn(ncol(DF), 3)
for(j in 1:ncol(combos)) {
    ix <- combos[, j]
    fit <- lm.fit(as.matrix(DF[ix[-1]]), DF[[ix[1]]])
    SSE <- sum(resid(fit)^2)
    if (SSE < eps) {
        ecoef <- round(c(-1, coef(fit)))
        names(ecoef)[1] <- names(DF)[ix[1]]
        print(ecoef)
    }
}

which gives this with the data in the post: 这给出了帖子中的数据:

 A  B  C 
-1  1 -1 
 C  F  G 
-1  1  1 
 G  H  I 
-1  1  1 

You can try a hierarchical clustering method. 您可以尝试分层聚类方法。 This will not give you the exact relationships and the coefficients but can give you an idea of the relationships you should test for. 这不会为您提供确切的关系和系数,但是可以使您对应该测试的关系有所了解。 First we prepare your data. 首先,我们准备您的数据。

a<-rbind(c(-33.697,8.610,42.31, 17.465, 24.84, 14.210, 10.632), 
  c(-4.698,15.993,20.69,6.222, 14.47,3.423, 11.047),
  c(-37.458,9.687, 47.14, 14.659, 32.49, 12.759, 19.726),
  c(-23.851,16.517,40.37,14.392,25.98,9.438,16.538),
  c(3.329,15.629,12.30,3.449,8.85,2.635,6.215),
  c(-38.071,5.746,43.82,15.932,27.89,14.113,13.772))
colnames(a)<-c("A", "B", "C", "F", "G", "H", "I")

Then we calculate the correlation between your variables and create distances which we then cluster. 然后,我们计算您的变量之间的相关性并创建距离,然后将其聚类。

dd <- as.dist((1 - cor(a))/2)
plot(hclust(dd))

That should give you an idea of the relationship between the different time series. 这应该使您了解不同时间序列之间的关系。 A plot of the result is shown below. 结果图如下所示。

群集树状图的图

You can find linear dependence relations with MASS::Null . 您可以使用MASS::Null找到线性相关关系。 They are equivalent to, but not as sparse as those you found by visual inspection. 它们与通过肉眼检查发现的那些等效但不稀疏。

library(MASS)
Null(t(d)) # One relation per column
#             [,1]        [,2]        [,3]
# [1,]  0.41403998 -0.04178588  0.45582586
# [2,] -0.41403998  0.04178588 -0.45582586
# [3,] -0.02626794 -0.52439443  0.49812649
# [4,]  0.44030792  0.48260856 -0.04230063
# [5,]  0.62687195 -0.01159430 -0.36153375
# [6,] -0.18656403  0.49420285  0.31923312
# [7,] -0.18656403  0.49420285  0.31923312
as.matrix(d) %*% Null(t(d))  # zero

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM