簡體   English   中英

多處理 Pool.imap 壞了?

[英]multiprocessing Pool.imap broken?

我已經嘗試了包含在 python2.6 Ubuntu 包中的多處理( __version__說 0.70a1)和來自 PyPI (2.6.2.1) 的最新版本。 在這兩種情況下,我都不知道如何正確使用 imap - 它會導致整個解釋器停止響應 ctrl-C(盡管 map 工作正常)。 pdb 顯示next()掛在IMapIterator的條件變量wait()調用上,所以沒有人叫醒我們。 任何提示? 提前致謝。

$ cat /tmp/go3.py
import multiprocessing as mp
print mp.Pool(1).map(abs, range(3))
print list(mp.Pool(1).imap(abs, range(3)))

$ python /tmp/go3.py
[0, 1, 2]
^C^C^C^C^C^\Quit

首先注意這是有效的:

import multiprocessing as mp
import multiprocessing.util as util
pool=mp.Pool(1)
print list(pool.imap(abs, range(3)))

不同之處在於pool在調用pool.imap()結束時不會最終確定。

相比之下,

print(list(mp.Pool(1).imap(abs, range(3))))

導致Pool實例在imap調用結束后很快完成。 缺少引用會導致調用Finalizer (在Pool類中稱為self._terminate )。 這會啟動一系列命令,這些命令會拆除任務處理程序線程、結果處理程序線程、工作子進程等。

這一切發生得如此之快,以至於至少在大多數運行中,發送到任務處理程序的任務沒有完成。

以下是相關的代碼位:

從/usr/lib/python2.6/multiprocessing/pool.py:

class Pool(object):
    def __init__(self, processes=None, initializer=None, initargs=()):
        ...
        self._terminate = Finalize(
            self, self._terminate_pool,
            args=(self._taskqueue, self._inqueue, self._outqueue, self._pool,
                  self._task_handler, self._result_handler, self._cache),
            exitpriority=15
            )

/usr/lib/python2.6/multiprocessing/util.py:

class Finalize(object):
    '''
    Class which supports object finalization using weakrefs
    '''
    def __init__(self, obj, callback, args=(), kwargs=None, exitpriority=None):
        ...
        if obj is not None:
            self._weakref = weakref.ref(obj, self)   

obj即將完成時weakref.ref(obj,self)導致self()被調用。

我使用調試命令util.log_to_stderr(util.SUBDEBUG)來了解事件的順序。 例如:

import multiprocessing as mp
import multiprocessing.util as util
util.log_to_stderr(util.SUBDEBUG)

print(list(mp.Pool(1).imap(abs, range(3))))

產量

[DEBUG/MainProcess] created semlock with handle 3077013504
[DEBUG/MainProcess] created semlock with handle 3077009408
[DEBUG/MainProcess] created semlock with handle 3077005312
[DEBUG/MainProcess] created semlock with handle 3077001216
[INFO/PoolWorker-1] child process calling self.run()
[SUBDEBUG/MainProcess] finalizer calling <bound method type._terminate_pool of <class 'multiprocessing.pool.Pool'>> with args (<Queue.Queue instance at 0x9d6e62c>, <multiprocessing.queues.SimpleQueue object at 0x9cf04cc>, <multiprocessing.queues.SimpleQueue object at 0x9d6e40c>, [<Process(PoolWorker-1, started daemon)>], <Thread(Thread-1, started daemon -1217967248)>, <Thread(Thread-2, started daemon -1226359952)>, {0: <multiprocessing.pool.IMapIterator object at 0x9d6eaec>}) and kwargs {}
[DEBUG/MainProcess] finalizing pool
...

並將其與

import multiprocessing as mp
import multiprocessing.util as util
util.log_to_stderr(util.SUBDEBUG)
pool=mp.Pool(1)
print list(pool.imap(abs, range(3)))

這產生

[DEBUG/MainProcess] created semlock with handle 3078684672
[DEBUG/MainProcess] created semlock with handle 3078680576
[DEBUG/MainProcess] created semlock with handle 3078676480
[DEBUG/MainProcess] created semlock with handle 3078672384
[INFO/PoolWorker-1] child process calling self.run()
[DEBUG/MainProcess] doing set_length()
[0, 1, 2]
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[SUBDEBUG/MainProcess] calling <Finalize object, callback=_terminate_pool, args=(<Queue.Queue instance at 0xb763e60c>, <multiprocessing.queues.SimpleQueue object at 0xb76c94ac>, <multiprocessing.queues.SimpleQueue object at 0xb763e3ec>, [<Process(PoolWorker-1, started daemon)>], <Thread(Thread-1, started daemon -1218274448)>, <Thread(Thread-2, started daemon -1226667152)>, {}), exitprority=15>
...
[DEBUG/MainProcess] finalizing pool

就我而言,我在調用pool.imap()沒有期望返回值,也沒有讓它工作。 但是,如果我用pool.map()嘗試它,它工作正常。 問題與之前的答案完全一樣:沒有調用終結器,因此該過程在開始之前就被有效地轉儲了。

解決方案是調用終結器,例如list()函數。 這導致它正常工作,因為它現在需要將履行交給列表函數,因此該過程被執行。 簡而言之,它解釋如下(當然,這是簡化的。現在,假設它有用):

from multiprocessing import Pool
from shutil import copy
from tqdm import tqdm

filedict = { r"C:\src\file1.txt": r"C:\trg\file1_fixed.txt",
             r"C:\src\file2.txt": r"C:\trg\file2_fixed.txt",
             r"C:\src\file3.txt": r"C:\trg\file3_fixed.txt",
             r"C:\src\file4.txt": r"C:\trg\file4_fixed.txt" }

# target process
def copyfile(srctrg):  
    copy(srctrg[0],srctrg[1])
    return True

# a couple of trial processes for illustration
with Pool(2) as pool:

    # works fine with map, but cannot utilize tqdm() since no iterator object is returned 
    pool.map(copyfile,list(filedict.items()))

    # will not work, since no finalizer is called for imap
    tqdm(pool.imap(copyfile,list(filedict.items())))    # NOT WORKING

    # this works, since the finalization is forced for the process
    list(tqdm(pool.imap(copyfile,list(filedict.items()))))

就我而言,簡單的解決方案是將整個tqdm(pool.imap(...))list()中以強制執行。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM