简体   繁体   English

使用降雪进行并行计算时为什么不进行负载平衡?

[英]Why not load balance when parallel computing using snowfall?

For a long time I have been using sfLapply for a lot of my parallel r scripts. 很长一段时间以来,我一直在使用sfLapply来处理很多并行r脚本。 However, recently as I have delved more into parallel computing, I have been using sfClusterApplyLB, which can save a lot of time if individual instances do not take the same amount of time to run. 然而,最近我已经深入研究并行计算,我一直在使用sfClusterApplyLB,如果单个实例不需要花费相同的时间来运行,那么可以节省大量时间。 Were as sfLapply will wait for each instance of a batch to finish before loading a new batch (which may lead to idle instances), with sfClusterApplyLB instances that complete their task will immediately be assigned to remaining elements in the list thus potentially saving quite a bit of time when instances do not take precisely the same amount of time. 如果sfLapply将在加载新批处理之前等待批处理的每个实例完成(这可能导致空闲实例),完成任务的sfClusterApplyLB实例将立即分配给列表中的其余元素,因此可能会节省相当多的时间当实例没有花费相同的时间时。 This has led me to question why would we ever want to NOT load balance our runs when using snowfall? 这让我质疑为什么我们在使用降雪时不想平衡我们的跑步? The only thing I have found so far is that, when there is an error in the paralleled script, sfClusterApplyLB will still cycle through the entire list before giving an error, while sfLapply will stop after trying the first batch. 到目前为止我唯一发现的是,当并行脚本出现错误时,sfClusterApplyLB仍会在发出错误之前循环遍历整个列表,而sfLapply将在尝试第一批后停止。 What else am I missing? 我还缺少什么? are there any other costs/ downsides of load balancing? 是否存在负载平衡的任何其他成本/缺点? Below is an example code that shows the difference between the two 下面是一个示例代码,显示了两者之间的差异

rm(list = ls()) #remove all past worksheet variables
working_dir="D:/temp/"
setwd(working_dir)
n_spp=16
spp_nmS=paste0("sp_",c(1:n_spp))
spp_nm=spp_nmS[1]
sp_parallel_run=function(sp_nm){
  sink(file(paste0(working_dir,sp_nm,"_log.txt"), open="wt"))#######NEW
  cat('\n', 'Started on ', date(), '\n') 
  ptm0 <- proc.time()
  jnk=round(runif(1)*8000000) #this is just a redundant script that takes an arbitrary amount of time to run
  jnk1=runif(jnk)
  for (i in 1:length(jnk1)){
    jnk1[i]=jnk[i]*runif(1)
  }
  ptm1=proc.time() - ptm0
  jnk=as.numeric(ptm1[3])
  cat('\n','It took ', jnk, "seconds to model", sp_nm)

  #stop sinks
  sink.reset <- function(){
    for(i in seq_len(sink.number())){
      sink(NULL)
    }
  }
  sink.reset()
}
require(snowfall)
cpucores=as.integer(Sys.getenv('NUMBER_OF_PROCESSORS'))

sfInit( parallel=T, cpus=cpucores) # 
sfExportAll() 
system.time((sfLapply(spp_nmS,fun=sp_parallel_run)))
sfRemoveAll()
sfStop()

sfInit( parallel=T, cpus=cpucores) # 
sfExportAll() 
system.time(sfClusterApplyLB(spp_nmS,fun=sp_parallel_run)) 
sfRemoveAll()
sfStop()

The sfLapply function is useful because it splits up the input values into one group of tasks for each available worker, which is what the mclapply function calls prescheduling . sfLapply函数很有用,因为它将输入值拆分为每个可用worker的一组任务,这就是mclapply函数调用prescheduling的内容 This can give much better performance than sfClusterApplyLB when the tasks don't take long. 当任务不需要很长时间时,这可以提供比sfClusterApplyLB更好的性能。

Here's an extreme example that demonstrates the advantages of prescheduling: 这是一个极端的例子,展示了预先安排的优点:

> system.time(sfLapply(1:100000, sqrt))
   user  system elapsed
  0.148   0.004   0.170
> system.time(sfClusterApplyLB(1:100000, sqrt))
   user  system elapsed
 19.317   1.852  21.222

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM