简体   繁体   English

在一个data.table中与多个组进行互相关

[英]Cross-correlation with multiple groups in one data.table

I'd like to calculate the cross-correlations between groups of time series within on data.table. 我想计算data.table中时间序列组之间的互相关。 I have a time series data in this format: 我有这种格式的时间序列数据:

data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5)) , Y = rnorm(15) )

   group           Y
 1:    a  0.90855520
 2:    a -0.12463737
 3:    a -0.45754652
 4:    a  0.65789709
 5:    a  1.27632196
 6:    b  0.98483700
 7:    b -0.44282527
 8:    b -0.93169070
 9:    b -0.21878359
10:    b -0.46713392
11:    c -0.02199363
12:    c -0.67125826
13:    c  0.29263953
14:    c -0.65064603
15:    c -1.41143837

Each group has the same number of observations. 每组具有相同数量的观察结果。 What I am looking for is a way to obtain cross correlation between the groups: 我正在寻找的是一种获得组间互相关的方法:

group.1   group.2    correlation
      a         b          0.xxx
      a         c          0.xxx
      b         c          0.xxx

I am working on a script to subset each group and append the cross-correlations, but the data size is fairly large. 我正在编写一个脚本来对每个组进行子集化并附加交叉相关,但数据大小相当大。 Is there any efficient / zen way to do this? 有没有有效/禅的方式来做到这一点?

Does this help? 这有帮助吗?

data[,id:=rep(1:5,3)]
dtw  = dcast.data.table(data, id ~ group, value.var="Y" )[, id := NULL]
cor(dtw)

See Correlation between groups in R data.table 请参阅R data.table中的组之间的关联


Another way would be: 另一种方式是:

# data
set.seed(45L)
data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5)) , Y = rnorm(15) )

# method 2
setkey(data, "group")
data2 = data[J(c("b", "c", "a"))][, list(group2=group, Y2=Y)]
data[, c(names(data2)) := data2]

data[, cor(Y, Y2), by=list(group, group2)]

#     group group2         V1
# 1:      a      b -0.2997090
# 2:      b      c  0.6427463
# 3:      c      a -0.6922734

And to generalize this "other" way to more than three groups... 并将这种“其他”方式概括为三个以上的群体......

data = data.table( group = c(rep("a", 5),rep("b",5),rep("c",5),rep("d",5)) ,
                   Y = rnorm(20) )
setkey(data, "group")

groups = unique(data$group)
ngroups = length(groups)
library(gtools)
pairs = combinations(ngroups,2,groups)

d1 = data[pairs[,1],,allow.cartesian=TRUE]
d2 = data[pairs[,2],,allow.cartesian=TRUE]
d1[,c("group2","Y2"):=d2]
d1[,cor(Y,Y2), by=list(group,group2)]
#    group group2          V1
# 1:     a      b  0.10742799
# 2:     a      c  0.52823511
# 3:     a      d  0.04424170
# 4:     b      c  0.65407400
# 5:     b      d  0.32777779
# 6:     c      d -0.02425053

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM