繁体   English   中英

python 多处理集 memory 每个进程

[英]python multiprocessing set memory per process

我正在使用 python 对文本文件进行一些处理,并且遇到了MemoryError的问题。 有时正在处理的文件非常大,这意味着多处理进程正在使用过多的 RAM。

这是我的代码片段:

import multiprocessing as mp
import os

def preprocess_file(file_path):
    with open(file_path, "r+") as f:
        file_contents = f.read()
        # modify the file_contents
        # ...
        # overwrite file
        f.seek(0)
        f.write(file_contents)
        f.truncate()

if __name__ == "main":
    with mp.Pool(mp.cpu_count()) as pool:
        pool_processes = []
        # for all files in dir
        for root, dirs, files in os.walk(some_path):
            for f in files:
                pool_processes.append(os.path.join(root, f))
        # start the processes
        pool.map(preprocess_file, pool_processes)

我尝试使用资源 package 来设置每个进程可以使用多少 RAM 的限制,如下所示,但这并没有解决问题,而且我仍然收到MemoryError s,这让我相信它是pool.map这导致了问题。 我希望每个进程单独处理异常,以便可以跳过文件而不是使整个程序崩溃。

import resource

def preprocess_file(file_path):
    try:
        hard = os.sysconf("SC_PAGE_SIZE") * os.sysconf("SC_PHYS_PAGES") # total bytes of RAM in machine
        soft = (hard - 512 * 1024 * 1024) // mp.cpu_count() # split between each cpu and save 512MB for the system
        resource.setrlimit(resource.RLIMIT_AS, (soft, hard)) # apply limit
        with open(file_path, "r+") as f:
            # ...
    except Exception as e: # bad practice - should be more specific but just a placeholder
        # ...

如何让单个进程用完 memory 而让其他进程继续不受影响? 理想情况下,我想在preprocess_file文件中捕获异常,以便准确记录导致错误的文件。

编辑: preprocess_file function 不与任何其他进程共享数据,因此不需要共享 memory。 function 还需要一次读取整个文件,因为文件被重新格式化,这不能逐行完成。

编辑2:程序的回溯如下。 如您所见,错误实际上并不指向正在运行的文件,而是来自包的文件。

Process ForkPoolWorker-2:
Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 125, in worker
  File "/usr/lib64/python3.6/multiprocessing/queues.py", line 341, in put
  File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 51, in dumps
  File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 39, in __init__
MemoryError

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 258, in _bootstrap
  File "/usr/lib64/python3.6/multiprocessing/process.py", line 93, in run
  File "/usr/lib64/python3.6/multiprocessing/pool.py", line 130, in worker
  File "/usr/lib64/python3.6/multiprocessing/queues.py", line 341, in put
  File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 51, in dumps
  File "/usr/lib64/python3.6/multiprocessing/reduction.py", line 39, in __init__
MemoryError

如果引发了MemoryError ,则工作进程可能无法从这种情况中恢复 如果确实如此,正如@Thomas 建议的那样,请在某处捕获MemoryError

import multiprocessing as mp
from time import sleep


def initializer():
    # Probably set the memory limit here
    pass


def worker(i):
    sleep(1)

    try:
        if i % 2 == 0:
            raise MemoryError
    except MemoryError as ex:
        return str(ex)

    return i


if __name__ == '__main__':
    with mp.Pool(2, initializer=initializer) as pool:
        tasks = range(10)
        results = pool.map(worker, tasks)
        print(results)

如果工人无法恢复,整个池不太可能工作。 例如,更改worker以强制退出:

def worker(i):
    sleep(1)

    try:
        if i % 2 == 0:
            raise MemoryError
        elif i == 5:
            import sys
            sys.exit()
    except MemoryError as ex:
        return str(ex)

    return i

Pool.map永远不会返回并永远阻塞。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM