简体   繁体   English

5个时间序列(距离)的互相关和解释

[英]Cross-correlation of 5 time series (distance) and interpretation

I would appreciate some input in this a lot! 我将不胜感激!

I have data for 5 time series (an example of 1 step in the series is in the plot below), where each step in the series is a vertical profile of species sightings in the ocean which were investigated 6h apart. 我有5个时间序列的数据(该序列中1个步骤的示例在下面的图中),其中该序列中的每个步骤都是海洋中观察到的物种的垂直剖面,每6h进行一次调查。 All 5 steps are spaced vertically by 0.1m (and the 6h in time). 这5个步骤的垂直间距均为0.1m(时间间隔为6h)。

What I want to do is calculate the multivariate cross-correlation between all series in order to find out at which lag the profiles are most correlated and stable over time. 我想做的是计算所有序列之间的多元互相关 ,以找出轮廓在哪个滞后上最相关且随时间稳定。

Profile example: 个人资料示例: 一个时间步的示例图

I find the documentation in R on that not so great, so what I did so far is use the package MTS with the ccm function to create cross correlation matrices. 我发现R中的文档不是那么好,所以到目前为止,我所做的是使用带有ccm函数的MTS包创建互相关矩阵。 However, the interpretation of the figures is rather difficult with sparse documentation. 但是,使用稀疏文档很难解释这些数字。 I would appreciate some help with that a lot. 我将非常感谢您的帮助。

Data example: http://pastebin.com/embed_iframe.php?i=8gdAeGP4 Save in file cross_correlation_stack.csv or change as you wish. 数据示例: http ://pastebin.com/embed_iframe.php?i=8gdAeGP4保存在文件cross_correlation_stack.csv中或根据需要进行更改。

library(dplyr)
library(MTS)
library(data.table)

d1 <- file.path('cross_correlation_stack.csv')
d2 = read.csv(d1)

# USING package MTS
mod1<-ccm(d2,lag=1000,level=T)

#USING base R
acf(d2,lag.max=1000)

# MQ plot also from MTS package
mq(d2,lag=1000)

Which produces this (the ccm command): 产生此结果(ccm命令): 在此处输入图片说明

This: 这个:

在此处输入图片说明

and this: 和这个:

在此处输入图片说明

In parallel, the acf command from above produces this: 并行地,上面的acf命令会产生以下结果: 在此处输入图片说明

My question now is if somebody can give some input in whether I am going in the right direction or are there better suited packages and commands? 我现在的问题是,是否有人可以就我朝着正确的方向提出建议,还是有更适合的软件包和命令?

Since the default figures don't get any titles etc. What am I looking at, specifically in the ccm figures? 由于默认数字没有任何标题等,我在看什么,尤其是在ccm数字中?

The ACF command was proposed somewhere, but can I use it here? ACF命令是在某处提出的,但是我可以在这里使用它吗? In it's documentation it says ... calculates autocovariance or autocorrelation... I assume this is not what I want. 在它的文档中,它说...计算自协方差或自相关...我想这不是我想要的。 But then again it's the only command that seems to work multivariate. 但话又说回来,这是唯一可以运行多变量的命令。 I am confused. 我很困惑。

The plot with the significance values shows that after a lag of 150 (15 meters) the p values increase. 具有显着性值的图显示,在滞后150(15米)之后,p值增加。 How would you interpret that regarding my data? 关于我的数据,您将如何解释? 0.1 intervals of species sightings and many lags up to 100-150 are significant? 0.1个物种发现间隔和最多100-150的许多滞后是否有意义? Would that mean something like that peaks in sightings are stable over the 5 time-steps on a scale of 150 lags aka 15 meters? 这是否意味着在5个时间步中,瞄准点的峰值在150个延迟(也就是15米)的范围内保持稳定?

In either way it would be nice if somebody who worked with this before can explain what I am looking at! 无论哪种方式,如果以前使用过此功能的人可以解释我在看什么,那就太好了! Any input is highly appreciated! 任何输入都非常感谢!

You can use the base R function ccf() , which will estimate the cross-correlation function between any two variables x and y . 您可以使用基本的R函数ccf() ,该函数将估计任意两个变量xy之间的互相关函数。 However, it only works on vectors, so you'll have to loop over the columns in d1 . 但是,它仅适用于向量,因此您必须循环遍历d1的列。 Something like: 就像是:

cc <- vector("list",choose(dim(d1)[2],2))
par(mfrow=c(ceiling(choose(dim(d1)[2],2)/2),2))
cnt <- 1
for(i in 1:(dim(d1)[2]-1)) {
  for(j in (i+1):dim(d1)[2]) {
    cc[[cnt]] <- ccf(d1[,i],d1[,j],main=paste0("Cross-correlation of ",colnames(d1)[i]," with ",colnames(d1)[j]))
    cnt <- cnt + 1
  }
}

This will plot each of the estimated CCF's and store the estimates in the list cc . 这将绘制每个估计的CCF并将估计存储在列表cc It is important to remember that the lag- k value returned by ccf(x,y) is an estimate of the correlation between x[t+k] and y[t] . 重要的是要记住,由ccf(x,y)返回的lag- k值是x[t+k]y[t]之间相关性的估计。

All of that said, however, the ccf is only defined for data that are more-or-less normally distributed, but your data are clearly overdispersed with all of those zeroes. 综上所述,CCF仅针对正态分布的数据定义,但是您的数据显然被所有这些零过度分散了。 Therefore, lacking some adequate transformation, you should really look into other metrics of "association" such as the mutual information as estimated from entropy. 因此,由于缺乏足够的转换,您应该真正研究“关联”的其他指标,例如根据熵估计的互信息。 I suggest checking out the R packages entropy and infotheo . 我建议检查R包的entropyinfotheo

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM