[英]python multiprocessing pool, make one worker to execute a different function
I have to perform some processing on each line of a file and I have many files present in an input directory.我必须对文件的每一行执行一些处理,并且输入目录中存在许多文件。 I have to dump the response I get from processing each line (from multiple input files) in to a single result file.我必须将处理每一行(来自多个输入文件)的响应转储到单个结果文件中。
I have decided this flow - Will dump all the input files into a queue and fork 3-4 workers, where each worker works on a unique file, read its content and after processing dump the response into a writer queue.我已经决定了这个流程 - 将所有输入文件转储到一个队列中并分叉 3-4 个工作人员,其中每个工作人员处理一个唯一的文件,读取其内容,并在处理后将响应转储到写入队列中。 Their will be a separate process which will read this queue and write result in to an output file.它们将是一个单独的进程,它将读取此队列并将结果写入输出文件。
I have comeup with this code-我想出了这个代码-
def write_to_csv(queue):
file_path = os.path.join(os.getcwd(), 'test_dir', "writer.csv")
ofile = open(file_path, "w")
job_writer = csv.writer(ofile, delimiter='\a')
while 1:
line = queue.get()
if line == 'kill':
print("Kill Signal received")
break
if line:job_writer.writerow([str(line).strip()])
ofile.close()
def worker_main(file_queue, writer_queue):
print os.getpid(),"working"
while not file_queue.empty():
file_name = file_queue.get(True)
# somewhere in process_file writer_queue.put(line_resp) is called
# for every line in file_name
process_file(file_name, writer_queue)
if __name__ == "__main__":
file_queue = multiprocessing.Queue()
output_queue = multiprocessing.Queue()
writer_pool = multiprocessing.Pool(1, write_to_csv, (output_queue,))
cwd = os.getcwd()
test_dir = 'test_dir'
file_list = os.listdir(os.path.join(cwd, test_dir))
for file_name in file_list:
file_queue.put(file_name)
reader_pool = multiprocessing.Pool(3, worker_main, (file_queue, output_queue))
reader_pool.close()
reader_pool.join()
output_queue.put("kill")
print("Finished execution")
The code is working fine.代码工作正常。 But I wonder if it is possible to do the same thing by a single multiprocessing Pool as opposed to using reader_pool
and writer_pool
in the code above但是我想知道是否可以通过单个多处理池来做同样的事情,而不是在上面的代码中使用reader_pool
和writer_pool
You could do that by apply_async
, also don't set initializer
( write_to_csv
or worker_main
in your case) when creating Pool
object, or it would run the task by default.您可以通过apply_async
做到这apply_async
,在创建Pool
对象时也不要设置initializer
(在您的情况下是write_to_csv
或worker_main
),否则它会默认运行任务。
file_queue = multiprocessing.Queue()
output_queue = multiprocessing.Queue()
cwd = os.getcwd()
test_dir = 'test_dir'
file_list = os.listdir(os.path.join(cwd, test_dir))
for file_name in file_list:
file_queue.put(file_name)
pool = Pool(4)
pool.apply_async(write_to_csv, (output_queue,))
[pool.apply_async(worker_main, (file_queue, output_queue, )) for i in range(3)]
pool.close()
pool.join()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.