简体   繁体   English

python多处理池,让一个工人执行不同的功能

[英]python multiprocessing pool, make one worker to execute a different function

I have to perform some processing on each line of a file and I have many files present in an input directory.我必须对文件的每一行执行一些处理,并且输入目录中存在许多文件。 I have to dump the response I get from processing each line (from multiple input files) in to a single result file.我必须将处理每一行(来自多个输入文件)的响应转储到单个结果文件中。

I have decided this flow - Will dump all the input files into a queue and fork 3-4 workers, where each worker works on a unique file, read its content and after processing dump the response into a writer queue.我已经决定了这个流程 - 将所有输入文件转储到一个队列中并分叉 3-4 个工作人员,其中每个工作人员处理一个唯一的文件,读取其内容,并在处理后将响应转储到写入队列中。 Their will be a separate process which will read this queue and write result in to an output file.它们将是一个单独的进程,它将读取此队列并将结果写入输出文件。

I have comeup with this code-我想出了这个代码-

def write_to_csv(queue):
    file_path = os.path.join(os.getcwd(), 'test_dir', "writer.csv")
    ofile = open(file_path, "w")
    job_writer = csv.writer(ofile, delimiter='\a')
    while 1:
        line = queue.get()
        if line == 'kill':
            print("Kill Signal received")
            break
        if line:job_writer.writerow([str(line).strip()])
    ofile.close()

def worker_main(file_queue, writer_queue):
    print os.getpid(),"working"
    while not file_queue.empty():
        file_name = file_queue.get(True)
        # somewhere in process_file writer_queue.put(line_resp) is called
        # for every line in file_name
        process_file(file_name, writer_queue) 


if __name__ == "__main__":
    file_queue = multiprocessing.Queue()
    output_queue = multiprocessing.Queue()

    writer_pool = multiprocessing.Pool(1, write_to_csv, (output_queue,))

    cwd = os.getcwd()
    test_dir = 'test_dir'
    file_list = os.listdir(os.path.join(cwd, test_dir))
    for file_name in file_list:
        file_queue.put(file_name)

    reader_pool = multiprocessing.Pool(3, worker_main, (file_queue, output_queue))
    reader_pool.close()
    reader_pool.join()

    output_queue.put("kill")

    print("Finished execution")

The code is working fine.代码工作正常。 But I wonder if it is possible to do the same thing by a single multiprocessing Pool as opposed to using reader_pool and writer_pool in the code above但是我想知道是否可以通过单个多处理池来做同样的事情,而不是在上面的代码中使用reader_poolwriter_pool

You could do that by apply_async , also don't set initializer ( write_to_csv or worker_main in your case) when creating Pool object, or it would run the task by default.您可以通过apply_async做到这apply_async ,在创建Pool对象时也不要设置initializer (在您的情况下是write_to_csvworker_main ),否则它会默认运行任务。

file_queue = multiprocessing.Queue()
output_queue = multiprocessing.Queue()

cwd = os.getcwd()
test_dir = 'test_dir'
file_list = os.listdir(os.path.join(cwd, test_dir))
for file_name in file_list:
    file_queue.put(file_name)

pool = Pool(4)

pool.apply_async(write_to_csv, (output_queue,))
[pool.apply_async(worker_main, (file_queue, output_queue, )) for i in range(3)]

pool.close()
pool.join()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM