Python多处理进程会在一段时间后休眠

Question

我有一个运行在目录中的脚本，并搜索给定字符串的给定结尾（即.xml）的所有文件并替换它们。 为此，我使用了python多处理库。

作为一个例子，我使用1100个.xml文件，大约200MB的数据。 我的MBP '15 15“的完整执行时间是8分钟。

但是几分钟后，进程的进程就会进入睡眠状态，我在“顶部”看到了这里（这是在7米之后......）。

最高输出

PID   COMMAND      %CPU  TIME     #TH    #WQ  #PORT MEM    PURG   CMPR PGRP PPID STATE    BOOSTS         %CPU_ME %CPU_OTHRS
1007  Python       0.0   07:03.51 1      0    7     5196K  0B     0B   998  998  sleeping *0[1]          0.00000 0.00000
1006  Python       99.8  07:29.07 1/1    0    7     4840K  0B     0B   998  998  running  *0[1]          0.00000 0.00000
1005  Python       0.0   02:10.02 1      0    7     4380K  0B     0B   998  998  sleeping *0[1]          0.00000 0.00000
1004  Python       0.0   04:24.44 1      0    7     4624K  0B     0B   998  998  sleeping *0[1]          0.00000 0.00000
1003  Python       0.0   04:25.34 1      0    7     4572K  0B     0B   998  998  sleeping *0[1]          0.00000 0.00000
1002  Python       0.0   04:53.40 1      0    7     4612K  0B     0B   998  998  sleeping *0[1]          0.00000 0.00000

所以现在只有一个过程正在完成所有工作，而其他过程在4分钟后就睡着了。

代码段

# set cpu pool to cores in computer
pool_size = multiprocessing.cpu_count()

# create pool
pool = multiprocessing.Pool(processes=pool_size)

# give pool function and input data - here for each file in file_list
pool_outputs = pool.map(check_file, file_list)

# if no more tasks are available: close all
pool.close()
pool.join()

那么为什么所有流程都要睡着呢？

我的猜测：文件列表分为池中的所有工人（每个工具数量相同），少数只是“幸运”得到小文件 - 因此更早完成。 这可能是真的吗？ 我只是认为它更像是一个队列，以便每个工作人员在完成时获取一个新文件 - 直到列表为空。

Answer 1

正如@ Felipe-Lema指出的那样，它是一个经典的RTFM。

我使用多处理队列而不是池重写了脚本中提到的部分并改进了运行时：

def check_files(file_list):
    """Checks and replaces lines in files
    @param file_list: list of files to search
    @return counter: number of occurrence """

    # as much workers as CPUs are available (HT included)
    workers = multiprocessing.cpu_count()

    # create two queues: one for files, one for results
    work_queue = Queue()
    done_queue = Queue()
    processes = []

    # add every file to work queue
    for filename in file_list:
        work_queue.put(filename)

    # start processes
    for w in xrange(workers):
        p = Process(target=worker, args=(work_queue, done_queue))
        p.start()
        processes.append(p)
        work_queue.put('STOP')

    # wait until all processes finished
    for p in processes:
        p.join()

    done_queue.put('STOP')

    # beautify results and return them
    results = []
    for status in iter(done_queue.get, 'STOP'):
        if status is not None:
             results.append(status)

     return results

Python多处理进程会在一段时间后休眠

问题描述

最高输出

代码段

1 个解决方案

解决方案1
2 已采纳 2015-08-19 14:07:50

Python多处理进程会在一段时间后休眠

问题描述

最高输出

代码段

1 个解决方案

解决方案1 2 已采纳 2015-08-19 14:07:50

解决方案1
2 已采纳 2015-08-19 14:07:50