Python多處理 - 調試OSError：[Errno 12]無法分配內存

Question

我面臨以下問題。 我正在嘗試並行化更新文件的函數，但由於OSError: [Errno 12] Cannot allocate memory ，我無法啟動Pool() OSError: [Errno 12] Cannot allocate memory 。 我開始在服務器上四處看看，這並不像我使用舊的，弱的/實際內存。 見htop ： 另外， free -m顯示我除了大約7GB的交換內存外還有足夠的RAM： 而我正在嘗試使用的文件也不是那么大。 我將粘貼我的代碼（和堆棧跟蹤），其中，大小如下：

使用的predictionmatrix矩陣數據幀占用大約。 根據pandasdataframe.memory_usage() 80MB文件geo.geojson是2MB

我該如何調試呢？ 我可以檢查什么以及如何檢查？ 感謝您的任何提示/技巧！

碼：

def parallelUpdateJSON(paramMatch, predictionmatrix, data):
    for feature in data['features']: 
        currentfeature = predictionmatrix[(predictionmatrix['SId']==feature['properties']['cellId']) & paramMatch]
        if (len(currentfeature) > 0):
            feature['properties'].update({"style": {"opacity": currentfeature.AllActivity.item()}})
        else:
            feature['properties'].update({"style": {"opacity": 0}})

def writeGeoJSON(weekdaytopredict, hourtopredict, predictionmatrix):
    with open('geo.geojson') as f:
        data = json.load(f)
    paramMatch = (predictionmatrix['Hour']==hourtopredict) & (predictionmatrix['Weekday']==weekdaytopredict)
    pool = Pool()
    func = partial(parallelUpdateJSON, paramMatch, predictionmatrix)
    pool.map(func, data)
    pool.close()
    pool.join()

    with open('output.geojson', 'w') as outfile:
        json.dump(data, outfile)

堆棧跟蹤：

---------------------------------------------------------------------------
OSError                                   Traceback (most recent call last)
<ipython-input-428-d6121ed2750b> in <module>()
----> 1 writeGeoJSON(6, 15, baseline)

<ipython-input-427-973b7a5a8acc> in writeGeoJSON(weekdaytopredict, hourtopredict, predictionmatrix)
     14     print("Start loop")
     15     paramMatch = (predictionmatrix['Hour']==hourtopredict) & (predictionmatrix['Weekday']==weekdaytopredict)
---> 16     pool = Pool(2)
     17     func = partial(parallelUpdateJSON, paramMatch, predictionmatrix)
     18     print(predictionmatrix.memory_usage())

/usr/lib/python3.5/multiprocessing/context.py in Pool(self, processes, initializer, initargs, maxtasksperchild)
    116         from .pool import Pool
    117         return Pool(processes, initializer, initargs, maxtasksperchild,
--> 118                     context=self.get_context())
    119 
    120     def RawValue(self, typecode_or_type, *args):

/usr/lib/python3.5/multiprocessing/pool.py in __init__(self, processes, initializer, initargs, maxtasksperchild, context)
    166         self._processes = processes
    167         self._pool = []
--> 168         self._repopulate_pool()
    169 
    170         self._worker_handler = threading.Thread(

/usr/lib/python3.5/multiprocessing/pool.py in _repopulate_pool(self)
    231             w.name = w.name.replace('Process', 'PoolWorker')
    232             w.daemon = True
--> 233             w.start()
    234             util.debug('added worker')
    235 

/usr/lib/python3.5/multiprocessing/process.py in start(self)
    103                'daemonic processes are not allowed to have children'
    104         _cleanup()
--> 105         self._popen = self._Popen(self)
    106         self._sentinel = self._popen.sentinel
    107         _children.add(self)

/usr/lib/python3.5/multiprocessing/context.py in _Popen(process_obj)
    265         def _Popen(process_obj):
    266             from .popen_fork import Popen
--> 267             return Popen(process_obj)
    268 
    269     class SpawnProcess(process.BaseProcess):

/usr/lib/python3.5/multiprocessing/popen_fork.py in __init__(self, process_obj)
     18         sys.stderr.flush()
     19         self.returncode = None
---> 20         self._launch(process_obj)
     21 
     22     def duplicate_for_child(self, fd):

/usr/lib/python3.5/multiprocessing/popen_fork.py in _launch(self, process_obj)
     65         code = 1
     66         parent_r, child_w = os.pipe()
---> 67         self.pid = os.fork()
     68         if self.pid == 0:
     69             try:

OSError: [Errno 12] Cannot allocate memory

UPDATE

根據@ robyschek的解決方案，我已將我的代碼更新為：

global g_predictionmatrix 

def worker_init(predictionmatrix):
    global g_predictionmatrix
    g_predictionmatrix = predictionmatrix    

def parallelUpdateJSON(paramMatch, data_item):
    for feature in data_item['features']: 
        currentfeature = predictionmatrix[(predictionmatrix['SId']==feature['properties']['cellId']) & paramMatch]
        if (len(currentfeature) > 0):
            feature['properties'].update({"style": {"opacity": currentfeature.AllActivity.item()}})
        else:
            feature['properties'].update({"style": {"opacity": 0}})

def use_the_pool(data, paramMatch, predictionmatrix):
    pool = Pool(initializer=worker_init, initargs=(predictionmatrix,))
    func = partial(parallelUpdateJSON, paramMatch)
    pool.map(func, data)
    pool.close()
    pool.join()


def writeGeoJSON(weekdaytopredict, hourtopredict, predictionmatrix):
    with open('geo.geojson') as f:
        data = json.load(f)
    paramMatch = (predictionmatrix['Hour']==hourtopredict) & (predictionmatrix['Weekday']==weekdaytopredict)
    use_the_pool(data, paramMatch, predictionmatrix)     
    with open('trentino-grid.geojson', 'w') as outfile:
        json.dump(data, outfile)

我仍然得到同樣的錯誤。 另外，根據文檔， map()應該將我的data划分為塊，所以我認為它不應該復制我的80MB rownum時間。 我可能錯了...... :)另外我注意到如果我使用較小的輸入（~11MB而不是80MB）我沒有得到錯誤。 所以我想我正在嘗試使用太多的內存，但我無法想象它是如何從80MB到16GB的RAM無法處理的。

Answer 1

使用multiprocessing.Pool ，啟動進程的默認方式是fork 。 fork的問題是整個過程是重復的。 （詳見此處）。 因此，如果您的主進程已經使用了大量內存，則此內存將被復制，從而達到此MemoryError 。 例如，如果您的主進程使用2GB內存並且您使用8 18GB ，則需要18GB的RAM。

您應該嘗試使用不同的啟動方法，例如'forkserver'或'spawn' ：

from multiprocessing import set_start_method, Pool
set_start_method('forkserver')

# You can then start your Pool without each process
# cloning your entire memory
pool = Pool()
func = partial(parallelUpdateJSON, paramMatch, predictionmatrix)
pool.map(func, data)

這些方法避免重復您的Process的工作空間，但由於您需要重新加載正在使用的模塊，因此啟動起來可能會慢一些。

Answer 2

我們有這個時間了。 根據我的系統管理員的說法，unix中存在“bug”，如果你的內存不足，如果你的進程達到最大文件描述符限制，就會引發同樣的錯誤。

我們有文件描述符泄漏，錯誤提升是[Errno 12]無法分配內存＃012OSError。

因此，您應該查看您的腳本並仔細檢查問題是否不是創建了太多的FD

Python多處理 - 調試OSError：[Errno 12]無法分配內存

問題描述

2 個解決方案

解決方案1
7 2017-03-03 20:31:31

解決方案2
5 已采納 2017-08-10 17:56:53

Python多處理 - 調試OSError：[Errno 12]無法分配內存

問題描述

2 個解決方案

解決方案1 7 2017-03-03 20:31:31

解決方案2 5 已采納 2017-08-10 17:56:53

解決方案1
7 2017-03-03 20:31:31

解決方案2
5 已采納 2017-08-10 17:56:53