简体   繁体   English

R并行计算与降雪 - 写入来自不同工人的文件

[英]R parallel computing with snowfall - writing to files from separate workers

I am using the snowfall 1.84 package for parallel computing and would like each worker to write data to its own separate file during the computation. 我使用降雪1.84包进行并行计算,并希望每个工作人员在计算过程中将数据写入自己的单独文件。 Is this possible ? 这可能吗 ? if so how ? 如果是这样的话?

I am using the "SOCK" type connection eg, sfInit( parallel=TRUE, ...,type="SOCK" ) and would like the code to be platform independent (unix/windows). 我正在使用“SOCK”类型连接,例如sfInit(parallel = TRUE,...,type =“SOCK”),并希望代码与平台无关(unix / windows)。

I know it is possible to Use the "slaveOutfile" option in sfInit to define a file where to write the log files. 我知道可以使用sfInit中的“slaveOutfile”选项来定义写入日志文件的文件。 But this is intended for debugging purposes and all slaves/workers must use the same file. 但这是出于调试目的,所有从属/工作者必须使用相同的文件。 I need each worker to have its OWN output file !!! 我需要每个工人都有自己的OWN输出文件!

The data i need to write are large dataframes, and NOT simple diagnostic messages. 我需要编写的数据是大型数据帧,而不是简单的诊断消息。 These dataframes need be output by the slaves and could not be sent back to the master process. 这些数据帧需要由从站输出,并且不能发送回主进程。 Anyone knows how i can get this done? 谁知道我怎么能这样做?

Thanks 谢谢

A simple solution is to use sfClusterApply to execute a function that opens a different file on each of the workers, assigning the resulting file object to a global variable so you can write to it in subsequent parallel operations: 一个简单的解决方案是使用sfClusterApply执行一个函数,该函数在每个worker上打开一个不同的文件,将生成的文件对象分配给一个全局变量,这样你就可以在后续的并行操作中写入它:

library(snowfall)
nworkers <- 3
sfInit(parallel=TRUE, cpus=nworkers, type='SOCK')
workerinit <- function(datfile) {
  fobj <<- file(datfile, 'w')
  NULL
}
sfClusterApply(sprintf('worker_%02d.dat', seq_len(nworkers)), workerinit)

work <- function(i) {
  write.csv(data.frame(x=1:3, i=i), file=fobj)
  i
}
sfLapply(1:10, work)
sfStop()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM