简体   繁体   中英

Running window of correlation coefficients for time series in R

I wish to calculate the correlation coefficients of 12 variables using a running window in R.

My data is stored in a zoo object with a %m.%d.%Y %H:%M:%S index, each of the 12 variables have 1343 observations. I don't know what window size I'm going to use but I can change it if needs.

@Joshua Ulrich has posted here how to calculate the rolling correlation using rollapplyr but this example has only two variables. With my limited R experience I am unsure as to how I can incorporate one of the apply family functions to run the correlations across all 12 variables.

Any pointers would be greatly appreciated

My data looks as follows:

> head(wideRawXTS)
                    DO0182U09A3 DO0182U09B3 DO0182U09C3 DO0182U21A1 DO0182U21A2 DO0182U21A3 DO0182U21B1 DO0182U21B2 DO0182U21B3
2017-01-20 16:30:00     -101.50     -103.37     -103.86     -104.78     -104.95     -105.33     -102.50      -99.43     -104.05
2017-01-20 16:45:00     -101.32     -102.75     -104.22     -104.51     -103.94     -105.29     -102.82     -101.99     -103.94
2017-01-20 17:00:00     -101.45     -103.30     -103.93     -104.70     -104.82     -105.13     -103.72     -103.95     -104.25
2017-01-20 17:15:00     -100.91      -95.92      -99.22     -103.83     -104.72     -105.19     -103.57     -101.36     -104.09
2017-01-20 17:30:00     -100.91     -103.04     -104.09     -102.15     -104.91     -105.18     -103.88     -104.09     -103.96
2017-01-20 17:45:00     -100.97     -103.67     -104.12     -105.07     -104.23      -97.48     -103.92     -103.89     -104.01
                    DO0182U21C1 DO0182U21C2 DO0182U21C3
2017-01-20 16:30:00     -104.51     -104.42     -105.17
2017-01-20 16:45:00     -104.74     -104.65     -105.25
2017-01-20 17:00:00     -105.02     -105.04     -105.32
2017-01-20 17:15:00     -103.90     -102.95     -105.16
2017-01-20 17:30:00     -104.75     -105.07     -105.23
2017-01-20 17:45:00     -105.08     -105.14     -104.89

WideRawDFscale的Cor输出

在此处输入图片说明

#Importing Data
rawDF <- read.csv("RTWP_DO0182_14Day_Raw.csv",
                  header = F,
                  col.names = c("Period Start Time",
                                "PLMNname",
                                "RNCname",
                                "WBTSname",
                                "WBTSID",
                                "WCELname",
                                "WCELID",
                                "Overload",
                                "AverageRTWP",
                                "DC_RTWP_High_PRX",
                                "RTWP_Threshold"),
                  colClasses = c("character",
                                 "NULL", #drops PLMN name
                                 "NULL", #drops RNC name
                                 "character", 
                                 "integer",
                                 "character",
                                 "integer",
                                 "NULL", #drops Overload
                                 "numeric",
                                 "NULL", #drops DC_RTWP_High_PRX
                                 "NULL"), #drops RTWP_Threshold
                  strip.white=TRUE,
                  na.strings = c("NA", 
                                 "NULL"),
                  as.is = T,
                  skip=2)

#convert period.start.time to POSIXlt
rawDF$Period.Start.Time <- as.POSIXlt(strptime(rawDF$Period.Start.Time, 
                                               format="%m.%d.%Y %H:%M:%S"))

#dcast the long data frame to a wide data frame
wideRawDF <- dcast(rawDF, 
                   Period.Start.Time ~ WCELname,
                   value.var = "AverageRTWP")

#assign the date times as rownames for converting to XTS
rownames(wideRawDF) = wideRawDF[[1]]

#drops the duplicate Period Start Time column since date times are rownames
wideRawDF$Period.Start.Time <- NULL

#Convert wideRawDF to XTS time series
wideRawDF <- as.xts(wideRawDF)

#NA values replaced by interpolated values
wideRawDF <- na.approx(wideRawDF, na.rm = FALSE)

#DF is centered by subtracting the column means and scaled by dividing the
#centered columns by their standard deviations
wideRawDFscaled <- scale(wideRawDF, center = TRUE, scale = TRUE)

#window <- 20
#cors <- combn(names(wideRawDFscaled),2,
#              function(p) rollapply(wideRawDFscaled, window ,
#                                    function(x) cor(x[,p[1]],x[,p[2]]),
#                                    by.column = FALSE))
#colnames(cors) <- combn(names(wideRawDFscaled),2,paste,collapse=".")

#Running window of correlation coefficients
Cor <- function(x) {
  corr <- cor(wideRawDFscaled)
  out <- as.data.frame.table(corr)[lower.tri(corr), ]
  with(out, setNames(Freq, paste(Var1, Var2)))
}
slidingCor <- rollapplyr(wideRawDFscaled, 6, Cor, by.column = FALSE)

You can use combn to apply the correlation to each pair of columns. Adapting the answer you referenced:

window <- 20
cors <- combn(colnames(wideRawXTS),2,
              function(p) rollapply(wideRawXTS, window  ,
                                    function(x) cor(x[,p[1]],x[,p[2]]), 
                                    by.column=FALSE))
colnames(cors) <- combn(colnames(wideRawXTS),2, paste, collapse=".")

Assuming the 3 column input shown reproducibly in the Note and using a window of width 3 for illustration define the correlation function Cor that accepts a matrix and computes its correlation matrix extracting the lower triangular part to eliminate redundancy and adding column names. Now use that with rollapplyr :

library(xts)

Cor <- function(x) {
    corr <- cor(x)
    out <- as.data.frame.table(corr)[lower.tri(corr), ]
    with(out, setNames(Freq, paste(Var1, Var2)))
}
rollapplyr(xx, 3, Cor, by.column = FALSE)

giving:

                    DO0182U09B3.DO0182U09A3 DO0182U09C3.DO0182U09A3 DO0182U09C3.DO0182U09B3
2017-01-20 16:30:00                      NA                      NA                      NA
2017-01-20 16:45:00                      NA                      NA                      NA
2017-01-20 17:00:00                 0.98573                -0.99613                -0.99671
2017-01-20 17:15:00                 0.98629                 0.95983                 0.99297
2017-01-20 17:30:00                 0.52664                 0.47475                 0.99820
2017-01-20 17:45:00                 0.56204                 0.50460                 0.99769

Note: Input xx in reproducible form is:

xx <- structure(c(-101.5, -101.32, -101.45, -100.91, -100.91, -100.97, 
-103.37, -102.75, -103.3, -95.92, -103.04, -103.67, -103.86, 
-104.22, -103.93, -99.22, -104.09, -104.12), .Dim = c(6L, 3L), .Dimnames = list(
    NULL, c("DO0182U09A3", "DO0182U09B3", "DO0182U09C3")), index = structure(c(1484947800, 
1484948700, 1484949600, 1484950500, 1484951400, 1484952300), tzone = "", tclass = c("POSIXct", 
"POSIXt")), class = c("xts", "zoo"), .indexCLASS = c("POSIXct", 
"POSIXt"), tclass = c("POSIXct", "POSIXt"), .indexTZ = "", tzone = "")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM