简体   繁体   中英

Cross-correlation of 5 time series (distance) and interpretation

I would appreciate some input in this a lot!

I have data for 5 time series (an example of 1 step in the series is in the plot below), where each step in the series is a vertical profile of species sightings in the ocean which were investigated 6h apart. All 5 steps are spaced vertically by 0.1m (and the 6h in time).

What I want to do is calculate the multivariate cross-correlation between all series in order to find out at which lag the profiles are most correlated and stable over time.

Profile example: 一个时间步的示例图

I find the documentation in R on that not so great, so what I did so far is use the package MTS with the ccm function to create cross correlation matrices. However, the interpretation of the figures is rather difficult with sparse documentation. I would appreciate some help with that a lot.

Data example: http://pastebin.com/embed_iframe.php?i=8gdAeGP4 Save in file cross_correlation_stack.csv or change as you wish.

library(dplyr)
library(MTS)
library(data.table)

d1 <- file.path('cross_correlation_stack.csv')
d2 = read.csv(d1)

# USING package MTS
mod1<-ccm(d2,lag=1000,level=T)

#USING base R
acf(d2,lag.max=1000)

# MQ plot also from MTS package
mq(d2,lag=1000)

Which produces this (the ccm command): 在此处输入图片说明

This:

在此处输入图片说明

and this:

在此处输入图片说明

In parallel, the acf command from above produces this: 在此处输入图片说明

My question now is if somebody can give some input in whether I am going in the right direction or are there better suited packages and commands?

Since the default figures don't get any titles etc. What am I looking at, specifically in the ccm figures?

The ACF command was proposed somewhere, but can I use it here? In it's documentation it says ... calculates autocovariance or autocorrelation... I assume this is not what I want. But then again it's the only command that seems to work multivariate. I am confused.

The plot with the significance values shows that after a lag of 150 (15 meters) the p values increase. How would you interpret that regarding my data? 0.1 intervals of species sightings and many lags up to 100-150 are significant? Would that mean something like that peaks in sightings are stable over the 5 time-steps on a scale of 150 lags aka 15 meters?

In either way it would be nice if somebody who worked with this before can explain what I am looking at! Any input is highly appreciated!

You can use the base R function ccf() , which will estimate the cross-correlation function between any two variables x and y . However, it only works on vectors, so you'll have to loop over the columns in d1 . Something like:

cc <- vector("list",choose(dim(d1)[2],2))
par(mfrow=c(ceiling(choose(dim(d1)[2],2)/2),2))
cnt <- 1
for(i in 1:(dim(d1)[2]-1)) {
  for(j in (i+1):dim(d1)[2]) {
    cc[[cnt]] <- ccf(d1[,i],d1[,j],main=paste0("Cross-correlation of ",colnames(d1)[i]," with ",colnames(d1)[j]))
    cnt <- cnt + 1
  }
}

This will plot each of the estimated CCF's and store the estimates in the list cc . It is important to remember that the lag- k value returned by ccf(x,y) is an estimate of the correlation between x[t+k] and y[t] .

All of that said, however, the ccf is only defined for data that are more-or-less normally distributed, but your data are clearly overdispersed with all of those zeroes. Therefore, lacking some adequate transformation, you should really look into other metrics of "association" such as the mutual information as estimated from entropy. I suggest checking out the R packages entropy and infotheo .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM