简体   繁体   中英

Speeding up correlation matrix calculation in R

I have a dataframe with 49 variables and 4M rows. I want to calculate the correlation matrix of 49 x 49. All columns are of class numeric.

Here's a sample :

df <- data.frame(replicate(49,sample(0:50,4000000,rep=TRUE)))

I used the standard cor function.

cor_matrix <- cor(df, use = "pairwise.complete.obs")

This is taking a really long time. I have 16GB RAM and an i5 single core 2.60Ghz.

Is there a way to make this calculation faster on my desktop?

There's a faster version of the cor function in the WGCNA package (used for inferring gene networks based on correlations). On my 3.1 GHz i7 w/ 16 GB of RAM it can solve the same 49 x 49 matrix about 20x faster:

mat <- replicate(49, as.numeric(sample(0:50,4000000,rep=TRUE)))

system.time(
    cor_matrix <- cor(mat, use = "pairwise.complete.obs")
)
user  system elapsed 
40.391   0.017  40.396 

system.time(
    cor_matrix_w <- WGCNA::cor(mat, use = "pairwise.complete.obs")
)
user  system elapsed 
1.822   0.468   2.290 

all.equal(cor_matrix, cor_matrix_w)
[1] TRUE

Check the helpfile for the function for details on differences between versions when your data contains more missing observations.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM