简体   繁体   中英

Parallelize nested for-loop on 3 dimensional array in R

Using R on a Windows machine, I am currently running a nested loop on a 3D array (720x360x1368) which cycles through d1 and d2 to apply a function over d3 and assemble the output to a new array of similar dimensionality.

In the following reproducible example, I have reduced the dimensions by factor 10, to make execution faster.

library(SPEI)

old.array = array(abs(rnorm(50)), dim=c(72,36,136))

new.array = array(dim=c(72,36,136))

for (i in 1:72) {
  for (j in 1:36) {
    new.listoflists <- spi(ts(old.array[i,j,], freq=12, start=c(1901,1)), 1, na.rm = T)
    new.array[i,j,] = new.listoflists$fitted
  }
}

where spi() is a function from the SPEI package returning a list of lists, of which one particular list $fitted of length 1368 is used from each loop increment to cunstruct the new array.

While this loop works flawlessly, it takes quite a long time to compute. I have read that foreach can be used to parallelize for loops.

However, I do not understand how the nesting and the assembling of the new array can be achieved such that the dimnames of the old and the new array are consistent.

(In the end, what I want to be able to, is to transform both the old and the new array into a "flat" long panel data frame using as.data.frame.table() and merge them along their three dimensions.)

Any help on how I can achieve the desired output using parallel computing will be highly appreciated!

Cheers
CubicTom

It would have been better with a reproducible example, here is what i come up with:

First create the cluster to use

cl <- makeCluster(6, type = "SOCK")
registerDoSNOW(cl)

Then you create the loop and close the cluster:

zz <- foreach(i = 1:720, .combine = c) %:% 
foreach(j = 1:360, .combine = c ) %dopar% {
new.listoflists <- FUN(old.array[i,j,])
new.array[i,j,] <- new.listoflists$list
}
stopCluster(cl)

This will create a list zz containing every iteration of new.array[i,j,], then you can bind them together with:

new.obj <- plyr::ldply(zz, data.frame)

Hope this helps you!

I did not use as much of dimensions as your question because I wanted to ensure the behavior was correct. So here I use mapply which take multiple arguments. The result is a list of the results. Then I wrapped it with matrix() to get the dimensions you hoped for. Please note that i is repeated using times and j is repeated using each . This is critical as matrix() put entries by row first then wraps to the next column when the number of row is reached.

new.array = array(1:(5*10*4), dim=c(5,10,4))

# FUN: function which returns lists of 
FUN <- function(x){
    list(lapply(x, rep, times=3))
}

# result of the computation
result <- matrix(
    mapply(
        function(i,j,...){

            FUN(new.array[i,j,])
        }
        ,i = rep(1:nrow(new.array),times=ncol(new.array))
        ,j = rep(1:ncol(new.array),each=nrow(new.array))
        ,new.array=new.array
    )
    ,nrow=nrow(new.array)
    ,ncol=ncol(new.array)
)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM