简体   繁体   English

OSError:[Errno 12]使用python多处理池时无法分配内存

[英]OSError: [Errno 12] Cannot allocate memory when using python multiprocessing Pool

I am trying to apply a function to 5 cross validation sets in parallel using Python's multiprocessing and repeat that for different parameter values, like so: 我正在尝试使用Python的multiprocessing将一个函数并行应用于5个交叉验证集,并对不同的参数值重复该操作,如下所示:

import pandas as pd
import numpy as np
import multiprocessing as mp
from sklearn.model_selection import StratifiedKFold

#simulated datasets
X = pd.DataFrame(np.random.randint(2, size=(3348,868), dtype='int8'))
y = pd.Series(np.random.randint(2, size=3348, dtype='int64'))

#dummy function to apply
def _work(args):
    del(args)

for C in np.arange(0.0,2.0e-3,1.0e-6):
    splitter = StratifiedKFold(n_splits=5)
    with mp.Pool(processes=5) as pool:
        pool_results = \
            pool.map(
                func=_work,
                iterable=((C,X.iloc[train_index],X.iloc[test_index]) for train_index, test_index in splitter.split(X, y))
            )

However halfway through execution I get the following error: 但是在执行过程中,出现以下错误:

Traceback (most recent call last):
  File "mre.py", line 19, in <module>
    with mp.Pool(processes=5) as pool:
  File "/usr/lib/python3.5/multiprocessing/context.py", line 118, in Pool
    context=self.get_context())
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 168, in __init__
    self._repopulate_pool()
  File "/usr/lib/python3.5/multiprocessing/pool.py", line 233, in _repopulate_pool
    w.start()
  File "/usr/lib/python3.5/multiprocessing/process.py", line 105, in start
    self._popen = self._Popen(self)
  File "/usr/lib/python3.5/multiprocessing/context.py", line 267, in _Popen
    return Popen(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 20, in __init__
    self._launch(process_obj)
  File "/usr/lib/python3.5/multiprocessing/popen_fork.py", line 67, in _launch
    self.pid = os.fork()
OSError: [Errno 12] Cannot allocate memory

I'm running this on Ubuntu 16.04 with 32Gb of memory, and checking htop during execution it never goes over 18.5Gb, so I don't think I'm running out of memory. 我在具有32Gb内存的Ubuntu 16.04上运行此程序,并在执行过程中检查htop ,它永远不会超过18.5Gb,所以我不认为我的内存不足。
It is definitly due to the splitting of my dataframes with the indexes from splitter.split(X,y) since when I directly pass my dataframes to the Pool object no error is thrown. 这绝对是由于使用splitter.split(X,y)的索引对数据帧进行了splitter.split(X,y)因为当我直接将数据帧传递给Pool对象时,不会引发任何错误。

I saw this answer that says it might be due to too many file dependencies being created, but I have no idea how I might go about fixing that, and isn't the context manager supposed to help avoid this sort of problem? 我看到这个答案说这可能是由于创建了太多文件依赖关系所致,但是我不知道如何解决该问题,上下文管理器是否不应该帮助避免此类问题?

os.fork() makes a copy of a process, so if you're sitting at about 18 GB of usage, and want to call fork , you need another 18 GB. os.fork()复制一个进程,因此,如果您的使用量约为18 GB,并且想调用fork ,则需要另外18 GB。 Twice 18 is 36 GB, which is well over 32 GB. 两次18是36 GB,远远超过32 GB。 While this analysis is (intentionally) naive—some things don't get copied on fork—it's probably sufficient to explain the problem. 尽管这种分析是(有意的)幼稚的(有些事情不会在fork上复制),但足以解释这个问题。

The solution is either to make the pools earlier, when less memory needs to be copied, or to work harder at sharing the largest objects. 解决方案是在需要较少内存复制时更早地创建池,或者在共享最大对象时更加努力。 Or, of course, add more memory (perhaps just virtual memory, ie, swap space) to the system. 或者,当然,向系统添加更多的内存(也许只是虚拟内存,即交换空间)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python多处理 - 调试OSError:[Errno 12]无法分配内存 - Python multiprocessing - Debugging OSError: [Errno 12] Cannot allocate memory 线程池显示OSError:[Errno 12]无法分配内存 - Thread Pool giving OSError: [Errno 12] Cannot allocate memory multiprocessing.Process导致:OSError:[Errno 12]即使我只运行1个进程也无法分配内存 - multiprocessing.Process causing: OSError: [Errno 12] Cannot allocate memory even when I run only 1 process Python3.4:OSError:[Errno 12]无法分配内存 - Python3.4 : OSError: [Errno 12] Cannot allocate memory Python subprocess.Popen “OSError: [Errno 12] 无法分配内存” - Python subprocess.Popen "OSError: [Errno 12] Cannot allocate memory" PySpark OSError:[Errno 12]无法分配内存 - PySpark OSError: [Errno 12] Cannot allocate memory Python os.fork OSError:[Errno 12]无法分配内存(但内存不是问题) - Python os.fork OSError : [Errno 12] Cannot allocate memory (but memory not the issue) 出现OSError:[Errno 12]抓取时无法分配内存 - Getting OSError: [Errno 12] Cannot allocate memory while scraping Python subprocess.Popen错误与OSError:[Errno 12]一段时间后无法分配内存 - Python subprocess.Popen erroring with OSError: [Errno 12] Cannot allocate memory after period of time OSError:[Errno 12]无法从python subprocess.call分配内存 - OSError: [Errno 12] Cannot allocate memory from python subprocess.call
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM