简体   繁体   中英

How to find / modify objects directly on parallel workers in R

I have an expensive problem I'm trying to split into pieces.
It's an optimization problem, and consists of an initial expensive setup step, followed by a recursive structure, such that the workers can only perform one step at a time before the results need to be collected, and a new task sent to the workers.

A complicating feature is that an initial setup step for the sub computations that should occur on each worker, has to be performed directly on each worker, and cannot be exported to the worker via clusterExport or similar.

I had hoped to be able to use clusterApply to assign the outcome of this initial setup to be stored on the specific worker, but can't seem to achieve this.

The first part of my code below shows my current attempts and describes what I would like, the second shows an attempt to see all objects available on the worker and where they are located.

   library(parallel)
### What I would like to do:
test2<-function(){
  MYOBJECT <-0
  cl=makeCluster(2,type='PSOCK')
  clusterExport(cl,c('MYOBJECT'),envir = environment())

  clusterApply(cl,1:2,function(x) { #attempt to modify / create MYOBJECT on the worker processes
    y <- x * 2 #expensive operation I only want to do once, that *cannot* be exported to the worker
    MYOBJECT <<- y
    MYOBJECT <- y
    assign('MYOBJECT',y,envir = parent.frame()) #envs[[1]])
  })

    clusterApply(cl,1:2,function(x) MYOBJECT * .5) #cheap operation to be done many times
}

test2()  #should return a list of 1 and 2, without assignment into the test2 function environment / re exporting



#trying to find out where MYOBJECT is on the worker
test<-function(){
  MYOBJECT <-1
  cl=makeCluster(1,type='PSOCK')
  clusterExport(cl,c('MYOBJECT'),envir = environment())

  clusterApply(cl,1,function(x) {
    MYOBJECT <<- list('hello')
    assign('MYOBJECT',list('hellohello'),envir = parent.frame()) #envs[[1]])
  })

  clusterApply(cl,1,function(x) 
    lapply(sys.frames(),ls) #where is MYOBJECT?
  )
}

test()

Simple solution in the end -- to modify the contents of individual workers in a persistent manner, the assignment within the clusterApply function needs to be made to the global environment.

library(parallel)
### What I would like to do:
test2<-function(){
  MYOBJECT <-0
  cl=makeCluster(2,type='PSOCK')
  clusterExport(cl,c('MYOBJECT'),envir = environment())

  clusterApply(cl,1:2,function(x) { #attempt to modify / create MYOBJECT on the worker processes
    y <- x * 2 #expensive operation I only want to do once, that *cannot* be exported to the worker
    assign('MYOBJECT2',y,envir = globalenv()) #envs[[1]])
  })

  clusterApply(cl,1:2,function(x) MYOBJECT2 * .5) #cheap operation to be done many times
}

test2()  #should return a list of 1 and 2, without assignment into the test2 function environment / re exporting

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM