简体   繁体   English

使用mclapply在R上进行并行计算

[英]Parallel computing on R using mclapply

I want to compare the performance of two methods on the same dataset. 我想比较两种方法在同一数据集上的性能。 To get multiple comparison between them, I'm using Bootstrap, so I think it may be a good idea to use parallel computing. 为了获得它们之间的多重比较,我正在使用Bootstrap,因此我认为使用并行计算可能是一个好主意。 Since the number of bootstrap is 50, so I assign 50 cores to do the job. 由于引导程序的数量为50,因此我分配了50个内核来完成这项工作。 The pseudocode is as following: 伪代码如下:

num.round <- 50  # number of bootstrap, which means I'll generate 50 subsets of the original dataset to do 50 comparison between the two methods
rounds.btsp <- seq(1, num.round)

BootStrap <- function(round.btsp) {
    result1 <- METHOD1(round.btsp)
    result2 <- METHOD2(round.btsp)
    return(list(result1 = result1, result2 = result2))
}

results.btsp <- mclapply(rounds.btsp, BootStrap, mc.cores = num.round)

for (round.btsp in rounds.btsp){
    result1 <- results.btsp[[round.btsp]]$result1
    result2 <- results.btsp[[round.btsp]]$result2
    COMPARE(result1, result2)  # do the comparison here, and this will be repeated 50 times
}

I got error in the step of "COMPARE", and when I looked into it, I found that when round.btsp = 10 there's nothing in result1 or result2. 在“ COMPARE”的步骤中出现错误,当我查看它时,我发现,当round.btsp = 10时,result1或result2中没有任何内容。 So I tried to set round.btsp to 10 and run what inside of the function "BootStrap", but everything was alright. 因此,我尝试将round.btsp设置为10并运行“ BootStrap”函数中的内容,但是一切正常。 Then I repeat the whole script again, and the same error happened again. 然后,我再次重复整个脚本,并且再次发生相同的错误。 But what's different from last time is that now round.btsp = 20 (these 10 and 20 are just for example). 但是,与上次不同的是,现在round.btsp = 20(例如,这10和20仅作为示例)。

There are altogether 80 cores on our server. 我们的服务器上共有80个核心。 But there are also other users use some of the cores from time to time. 但也有其他用户不时使用某些内核。

Regarding what I have observed and the situation of our cores, I'm guessing that the reason is that when I demand 50 cores but sometimes there are not enough for me, then some thread will not be run properly, therefore I'll get nothing from that thread. 关于我观察到的情况以及我们内核的情况,我猜测原因是当我需要50个内核,但有时我用不完时,某些线程将无法正常运行,因此我一无所获从那个线程。

Problem solved. 问题解决了。 Actually later I found that the problem lies above COMPARE : it's actually in the step for calculating results.btsp. 实际上,后来我发现问题出在COMPARE之上:实际上是在计算results.btsp.的步骤中results.btsp. So for now, my solution is to check the length of every element in results.btsp , if there's any not satisfying, results.btsp will be recalculated, which is to run the parallel computing again. 所以现在,我的解决办法是检查每个元素的长度results.btsp ,如果有任何不尽如人意, results.btsp将重新计算,这是再次运行并行计算。 Unless all pass the check, then it will move on to the for loop. 除非所有人都通过了检查,否则它将进入for循环。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM