简体   繁体   中英

Parallel computing on R using mclapply

I want to compare the performance of two methods on the same dataset. To get multiple comparison between them, I'm using Bootstrap, so I think it may be a good idea to use parallel computing. Since the number of bootstrap is 50, so I assign 50 cores to do the job. The pseudocode is as following:

num.round <- 50  # number of bootstrap, which means I'll generate 50 subsets of the original dataset to do 50 comparison between the two methods
rounds.btsp <- seq(1, num.round)

BootStrap <- function(round.btsp) {
    result1 <- METHOD1(round.btsp)
    result2 <- METHOD2(round.btsp)
    return(list(result1 = result1, result2 = result2))
}

results.btsp <- mclapply(rounds.btsp, BootStrap, mc.cores = num.round)

for (round.btsp in rounds.btsp){
    result1 <- results.btsp[[round.btsp]]$result1
    result2 <- results.btsp[[round.btsp]]$result2
    COMPARE(result1, result2)  # do the comparison here, and this will be repeated 50 times
}

I got error in the step of "COMPARE", and when I looked into it, I found that when round.btsp = 10 there's nothing in result1 or result2. So I tried to set round.btsp to 10 and run what inside of the function "BootStrap", but everything was alright. Then I repeat the whole script again, and the same error happened again. But what's different from last time is that now round.btsp = 20 (these 10 and 20 are just for example).

There are altogether 80 cores on our server. But there are also other users use some of the cores from time to time.

Regarding what I have observed and the situation of our cores, I'm guessing that the reason is that when I demand 50 cores but sometimes there are not enough for me, then some thread will not be run properly, therefore I'll get nothing from that thread.

Problem solved. Actually later I found that the problem lies above COMPARE : it's actually in the step for calculating results.btsp. So for now, my solution is to check the length of every element in results.btsp , if there's any not satisfying, results.btsp will be recalculated, which is to run the parallel computing again. Unless all pass the check, then it will move on to the for loop.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM