R中的并行处理示例

Question

Firstly, I would like to say that I am new to this topic. 首先，我想说我是这个话题的新手。

Secondly, although I read a lot about Parallel processing in R, I'm still not confident about it. 其次，尽管我阅读了很多有关R中的并行处理的信息，但我仍然对此并不自信。

I just invented simulation in R. So can someone help me with this invented code to understand Parallel processing ? 我只是在R中发明了模拟。那么有人可以用我发明的代码来帮助我理解并行处理吗？ (I can see how it works) （我可以看到它是如何工作的）

My code as follows (Large Random numbers) 我的代码如下（大随机数）

SimulateFn<-function(B,n){ 
  M1=list()
  for (i in 1:B){
    M1[i]=(n^2)}
  return(M1)}

SimulateFn(100000000,300000)

Could you please help me? 请你帮助我好吗？

Answer 1

First of all, parallelization is the procedure of dividing a task into sub tasks, which are simultaneously processed by multiple processors or cores and can be independent or share some dependency between them - the latter case needs more planning and attention. 首先，并行化是将一个任务划分为多个子处理器的过程，这些子任务由多个处理器或内核同时处理，并且可以独立或共享它们之间的某些依赖关系-后一种情况需要更多的计划和关注。

This procedure has some overhead to shedule subtasks - like copying data to each processor. 此过程有一些开销来处理子任务-例如将数据复制到每个处理器。 That said, parallelization is worthless for fast computations. 也就是说，并行化对于快速计算毫无用处。 In your example, the threee main procedures are indexing ( [ ), assignment ( <- ), and a (fast) math operation ( ^ ). 在您的示例中，三个主要过程是索引（ [ ），赋值（ <- ）和（快速）数学运算（ ^ ）。 The overhead for paralellization may be greater than the time to execute the subtask, so in that case parallelization can result in poorer performance! 并行化的开销可能大于执行子任务的时间，因此在这种情况下，并行化可能会导致性能变差！

Despite that, simple parallelization in R is fairly easy. 尽管如此，R中的简单并行化还是相当容易的。 An approach to parallelize your task is provided below, using the doParallel package. 下面提供了使用doParallel软件包并行化任务的方法。 Other approachs include using packages as parallel . 其他方法包括将包作为并行使用 。

library(doParallel)
## choose number of processors/cores
cl <- makeCluster(2)
registerDoParallel(cl)
## register elapsed time to evaluate code snippet
## %dopar% execute code in parallale
B <- 100000; n <- 300000
ptime <- system.time({ 
  M1=list()
  foreach(i=1:B) %dopar% {
      M1[i]=(n^2)
    }
  })
## %do% execute sequentially
stime <- system.time({ 
  M1=list()
  foreach(i=1:B) %do% {
    M1[i]=(n^2)
  }
})

The elapsed times on my computer (2 core) were 59.472 and 44.932, respectively. 我的计算机（2核）上经过的时间分别为59.472和44.932。 Clearly, there were no improvement by parallelization: indeed, performance was worse! 显然，并行化没有任何改进：实际上，性能更差！

A better example is shown below, where the main task is much more expensive in terms of computation need: 下面显示了一个更好的示例，其中主要任务在计算需求方面要昂贵得多：

x <- iris[which(iris[,5] != "setosa"), c(1,5)]
trials <- 10000
ptime <- system.time({
  r <- foreach(icount(trials), .combine=cbind) %dopar% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)
    }
  })
stime <- system.time({
  r <- foreach(icount(trials), .combine=cbind) %do% {
    ind <- sample(100, 100, replace=TRUE)
    result1 <- glm(x[ind,2]~x[ind,1], family=binomial(logit))
    coefficients(result1)
  }
})

And elapsed times were 24.709 and 34.502: a gain of 28%. 经过时间分别为24.709和34.502：增长28％。

R中的并行处理示例

问题描述

1 个解决方案

解决方案1
2 已采纳 2018-10-06 22:29:13

R中的并行处理示例

问题描述

1 个解决方案

解决方案1 2 已采纳 2018-10-06 22:29:13

解决方案1
2 已采纳 2018-10-06 22:29:13