简体   繁体   中英

Parallelize column pairwise matrix comparison

For a given matrix named db.mtx.rnk I'm calculating column pairwise kendall and spearman correlations and saving the results into a squared matrix. The problem is that input matrix is quite big (~5000x5000) and the number of pairwise combinations are too high which takes long time to perform. One option to reduce time by half would be to only calculate the upper triangle, which I have not implemented it yet, but still would be slow. I would like to parallelize to get results. Any hint?

Current code:

# -- get pairwise column combinations
pairwise.permuts <- t(expand.grid(1:ncol(db.mtx.rnk), 1:ncol(db.mtx.rnk)))

# -- iterate over two stats of interest   
for(stat in c("kendall", "spearman")){

      # -- kendall tau and spearman 
      stats.vec <- apply(pairwise.permuts, 2, function(x) cor(db.mtx.rnk[,x[1]], db.mtx.rnk[,x[2]], method = stat))
      stats.mtx <- matrix(stats.vec, ncol = ncol(db.mtx.rnk))
      colnames(stats.mtx) <- colnames(db.mtx.rnk)
      rownames(stats.mtx) <- colnames(db.mtx.rnk)
    }

Thanks

There are a lot of different possibilities how to parallelise in R. Some options are parallel , foreach and future . Given your code, the least changes you have to make with the future based package future.apply as it provides the function future_apply . You have to use plan(multiprocess) to tell future that it should be calculated in parallel. multiprocess uses different R sessions or forking depending on your OS. This leads to the code (and already speeds up a toy example on my machine):

library(future.apply)
plan(multiprocess)
for(stat in c("kendall", "spearman")){

  # -- kendall tau and spearman 
  stats.vec <- future_apply(pairwise.permuts, 2, function(x) cor(db.mtx.rnk[,x[1]], db.mtx.rnk[,x[2]], method = stat))
  stats.mtx <- matrix(stats.vec, ncol = ncol(db.mtx.rnk))
  colnames(stats.mtx) <- colnames(db.mtx.rnk)
  rownames(stats.mtx) <- colnames(db.mtx.rnk)
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM