简体   繁体   English

R 中的嵌套 foreach 循环,其中内循环返回一个矩阵

[英]Nested foreach loop in R where inner loop returns a matrix

I'm trying to parallelize a for loop that I have.我正在尝试并行化我拥有的 for 循环。 There is a nested loop inside the loop in question that I'd like to parallelize.有问题的循环中有一个嵌套循环,我想对其进行并行化。 The answer is bound to be very similar to: nested foreach loops in R to update common array , but I can't seem to get it to work.答案肯定非常类似于: R 中的嵌套 foreach 循环以更新公共数组,但我似乎无法让它工作。 I've tried all the options I can think of, including just turning the inner loop into its own function and parallelizing that, but I keep getting empty lists back.我已经尝试了我能想到的所有选项,包括将内部循环转换为它自己的函数并将其并行化,但我一直在返回空列表。

The first, non-foreach example works:第一个非 foreach 示例有效:

theFrame <- data.frame(col1=rnorm(100), col2=rnorm(100))

theVector <- 2:30

regFor <- function(dataFrame, aVector, iterations)
{   
    #set up a blank results matrix to save into.
    results <- matrix(nrow=iterations, ncol=length(aVector))

    for(i in 1:iterations)
    {
        #set up a blank road map to fill with 1s according to desired parameters
        roadMap <- matrix(ncol=dim(dataFrame)[1], nrow=length(aVector), 0)
        row.names(roadMap) <- aVector
        colnames(roadMap) <- 1:dim(dataFrame)[1]

        for(j in 1:length(aVector))
        {
            #sample some of the 0s and convert to 1s according to desired number of sample
            roadMap[j,][sample(colnames(roadMap),aVector[j])] <- 1
        }

        temp <- apply(roadMap, 1, sum)

        results[i,] <- temp
    }

    results <- as.data.frame(results)
    names(results) <- aVector

    results
}

test <- regFor(theFrame, theVector, 2)

But this and my other similar attempts do not work.但是这和我的其他类似尝试不起作用。

trying <- function(dataFrame, aVector, iterations, cores)
{   
    registerDoMC(cores)

    #set up a blank results list to save into. i doubt i need to do this
    results <- list()

    foreach(i = 1:iterations, .combine="rbind") %dopar%
    {
        #set up a blank road map to fill with 1s according to desired parameters
        roadMap <- matrix(ncol=dim(dataFrame)[1], nrow=length(aVector), 0)
        row.names(roadMap) <- aVector
        colnames(roadMap) <- 1:dim(dataFrame)[1]

        foreach(j = 1:length(aVector)) %do%
        {
            #sample some of the 0s and convert to 1s according to desired number of sample
            roadMap[j,][sample(colnames(roadMap),aVector[j])] <- 1
        }

        results[[i]] <- apply(roadMap, 1, sum)
    }
    results
}

test2 <- trying(theFrame, theVector, 2, 2)

I take it that I have to use foreach on the inner loop no matter what, right?我认为无论如何我都必须在内循环中使用 foreach ,对吗?

When using foreach, you never "set up a blank results list to save into", as you suspected.使用 foreach 时,您永远不会像您怀疑的那样“设置一个空白的结果列表以保存到”。 Instead, you combine the results of evaluating the body of the foreach loop, and that combined result is returned.相反,您将评估 foreach 循环体的结果组合起来,然后返回组合结果。 In this case, we want the outer foreach loop to combine vectors (computed by the inner foreach loop) row-wise into a matrix.在这种情况下,我们希望外部 foreach 循环将向量(由内部 foreach 循环计算)按行组合成一个矩阵。 That matrix is assigned to the variable results , which is then converted to a data frame.该矩阵被分配给变量results ,然后将其转换为数据框。

Here's my first attempt at converting your example:这是我第一次尝试转换您的示例:

library(doMC)

foreachVersion <- function(dataFrame, aVector, iterations, cores) {
  registerDoMC(cores) # unusual, but reasonable with doMC
  rows <- nrow(dataFrame)
  cols <- length(aVector)
  results <-
    foreach(i=1:iterations, .combine='rbind') %dopar% {
      # The value of the inner foreach loop is returned as
      # the value of the body of the outer foreach loop
      foreach(aElem=aVector, .combine='c') %do% {
        roadMapRow <- double(length=rows)
        roadMapRow[sample(rows,aElem)] <- 1
        sum(roadMapRow)
      }     
    }
  results <- as.data.frame(results)
  names(results) <- aVector
  results
}

The inner loop doesn't need to be implemented as a foreach loop.内循环不需要作为 foreach 循环来实现。 You could also use sapply , but I'd try to figure out if there's a faster method.您也可以使用sapply ,但我会尝试找出是否有更快的方法。 But for this answer, I wanted to demonstrate a foreach method.但是对于这个答案,我想演示一个 foreach 方法。 The only real optimization that I used was to get rid of the call to apply by executing sum inside the inner foreach loop.我使用的唯一真正的优化是通过在内部 foreach 循环中执行sum来摆脱对apply的调用。

您需要将 foreach 的结果放在一个变量中:

    results<- foreach( ...
  • I know this is an outdate question, but just to give a hint for those who do not get nested foreach to work.我知道这是一个过时的问题,但只是给那些没有嵌套 foreach 工作的人一个提示。
  • If parallelizing outer loop with foreach()%dopar%{foreach()%do%{}} , you would need to include .packages = c("doSNOW") in the augment of the outer loop, otherwise you will run into "doSNOW not found" error.如果使用foreach()%dopar%{foreach()%do%{}} .packages = c("doSNOW")循环,则需要在外循环的扩充中包含.packages = c("doSNOW") ,否则会遇到"doSNOW not found"错误。
  • Generally, people just parallelize inner loop ( foreach()%:%foreach()%dopar%{} , as also suggested on the forum), which can be much slower for a huge amount of data (waiting for combinations of every 100 results and also at the end of every inner loops, and this process is not parallel!).通常,人们只是并行化内部循环( foreach()%:%foreach()%dopar%{} ,正如论坛上所建议的那样),对于大量数据(等待每 100 个结果的组合),这可能会慢得多并且也在每个内部循环的末尾,并且这个过程不是并行的!)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM