[英]Parallelize column pairwise matrix comparison
For a given matrix named db.mtx.rnk
I'm calculating column pairwise kendall and spearman correlations and saving the results into a squared matrix.对于名为
db.mtx.rnk
的给定矩阵,我正在计算列成对的 kendall 和 spearman 相关性,并将结果保存到方阵中。 The problem is that input matrix is quite big (~5000x5000) and the number of pairwise combinations are too high which takes long time to perform.问题是输入矩阵非常大(~5000x5000)并且成对组合的数量太高,需要很长时间才能执行。 One option to reduce time by half would be to only calculate the upper triangle, which I have not implemented it yet, but still would be slow.
将时间减少一半的一种选择是只计算上三角形,我还没有实现它,但仍然会很慢。 I would like to parallelize to get results.
我想并行化以获得结果。 Any hint?
有什么提示吗?
Current code:当前代码:
# -- get pairwise column combinations
pairwise.permuts <- t(expand.grid(1:ncol(db.mtx.rnk), 1:ncol(db.mtx.rnk)))
# -- iterate over two stats of interest
for(stat in c("kendall", "spearman")){
# -- kendall tau and spearman
stats.vec <- apply(pairwise.permuts, 2, function(x) cor(db.mtx.rnk[,x[1]], db.mtx.rnk[,x[2]], method = stat))
stats.mtx <- matrix(stats.vec, ncol = ncol(db.mtx.rnk))
colnames(stats.mtx) <- colnames(db.mtx.rnk)
rownames(stats.mtx) <- colnames(db.mtx.rnk)
}
Thanks谢谢
There are a lot of different possibilities how to parallelise in R.如何在 R 中并行化有很多不同的可能性。 Some options are
parallel
, foreach
and future
.一些选项是
parallel
、 foreach
和future
。 Given your code, the least changes you have to make with the future
based package future.apply
as it provides the function future_apply
.鉴于您的代码,您必须对基于
future
的 package future.apply
进行最少的更改,因为它提供了 function future_apply
。 You have to use plan(multiprocess)
to tell future
that it should be calculated in parallel.您必须使用
plan(multiprocess)
来告诉future
它应该并行计算。 multiprocess
uses different R sessions or forking depending on your OS. multiprocess
进程根据您的操作系统使用不同的 R 会话或分叉。 This leads to the code (and already speeds up a toy example on my machine):这导致代码(并且已经在我的机器上加速了一个玩具示例):
library(future.apply)
plan(multiprocess)
for(stat in c("kendall", "spearman")){
# -- kendall tau and spearman
stats.vec <- future_apply(pairwise.permuts, 2, function(x) cor(db.mtx.rnk[,x[1]], db.mtx.rnk[,x[2]], method = stat))
stats.mtx <- matrix(stats.vec, ncol = ncol(db.mtx.rnk))
colnames(stats.mtx) <- colnames(db.mtx.rnk)
rownames(stats.mtx) <- colnames(db.mtx.rnk)
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.