Python：在使用多處理池時使用隊列寫入單個文件

Question

我有數十萬個文本文件，我想以各種方式解析它們。 我想將輸出保存到一個沒有同步問題的文件。 我一直在使用多處理池來執行此操作以節省時間，但我無法弄清楚如何組合 Pool 和 Queue。

以下代碼將保存 infile 名稱以及文件中連續“x”的最大數量。 但是，我希望所有進程都將結果保存到同一個文件中，而不是像我的示例中那樣保存到不同的文件中。 對此的任何幫助將不勝感激。

import multiprocessing

with open('infilenamess.txt') as f:
    filenames = f.read().splitlines()

def mp_worker(filename):
 with open(filename, 'r') as f:
      text=f.read()
      m=re.findall("x+", text)
      count=len(max(m, key=len))
      outfile=open(filename+'_results.txt', 'a')
      outfile.write(str(filename)+'|'+str(count)+'\n')
      outfile.close()

def mp_handler():
    p = multiprocessing.Pool(32)
    p.map(mp_worker, filenames)

if __name__ == '__main__':
    mp_handler()

Answer 1

多處理池為您實現了一個隊列。 只需使用將工作程序返回值返回給調用者的池方法。 imap 運行良好：

import multiprocessing 
import re

def mp_worker(filename):
    with open(filename) as f:
        text = f.read()
    m = re.findall("x+", text)
    count = len(max(m, key=len))
    return filename, count

def mp_handler():
    p = multiprocessing.Pool(32)
    with open('infilenamess.txt') as f:
        filenames = [line for line in (l.strip() for l in f) if line]
    with open('results.txt', 'w') as f:
        for result in p.imap(mp_worker, filenames):
            # (filename, count) tuples from worker
            f.write('%s: %d\n' % result)

if __name__=='__main__':
    mp_handler()

Answer 2

我接受了接受的答案並簡化了它，以便我自己理解這是如何工作的。 我把它貼在這里以防它幫助別人。

import multiprocessing

def mp_worker(number):
    number += 1
    return number

def mp_handler():
    p = multiprocessing.Pool(32)
    numbers = list(range(1000))
    with open('results.txt', 'w') as f:
        for result in p.imap(mp_worker, numbers):
            f.write('%d\n' % result)

if __name__=='__main__':
    mp_handler()

Answer 3

這是我使用多處理管理器對象的方法。 這種方法的好處是，當處理退出管理器時， run_multi() 函數中的塊會自動關閉文件寫入器隊列，從而使代碼非常易於閱讀，並且您可以輕松地嘗試停止偵聽隊列。

from functools import partial
from multiprocessing import Manager, Pool, Queue
from random import randint
import time

def run_multi():
    input = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10]
    with Manager() as manager:
        pool = Pool()  # By default pool will size depending on cores available
        message_queue = manager.Queue()  # Queue for sending messages to file writer listener
        pool.apply_async(file_writer, (message_queue, ))  # Start file listener ahead of doing the work
        pool.map(partial(worker, message_queue=message_queue), input)  # Partial function allows us to use map to divide workload

def worker(input: int, message_queue: Queue):
    message_queue.put(input * 10)
    time.sleep(randint(1, 5))  # Simulate hard work

def file_writer(message_queue: Queue):
    with open("demo.txt", "a") as report:
        while True:
            report.write(f"Value is: {message_queue.get()}\n")

if __name__ == "__main__":
    run_multi()

Python：在使用多處理池時使用隊列寫入單個文件

問題描述

3 個解決方案

解決方案1
42 已采納 2014-10-27 23:15:59

解決方案2
13 2017-10-18 02:03:37

解決方案3
1 2020-05-18 19:15:30

Python：在使用多處理池時使用隊列寫入單個文件

問題描述

3 個解決方案

解決方案1 42 已采納 2014-10-27 23:15:59

解決方案2 13 2017-10-18 02:03:37

解決方案3 1 2020-05-18 19:15:30

解決方案1
42 已采納 2014-10-27 23:15:59

解決方案2
13 2017-10-18 02:03:37

解決方案3
1 2020-05-18 19:15:30