[英]multiprocessing Pool.imap broken?
I've tried both the multiprocessing included in the python2.6 Ubuntu package ( __version__
says 0.70a1) and the latest from PyPI (2.6.2.1).我已经尝试了包含在 python2.6 Ubuntu 包中的多处理(
__version__
说 0.70a1)和来自 PyPI (2.6.2.1) 的最新版本。 In both cases I don't know how to use imap correctly - it causes the entire interpreter to stop responding to ctrl-C's (map works fine though).在这两种情况下,我都不知道如何正确使用 imap - 它会导致整个解释器停止响应 ctrl-C(尽管 map 工作正常)。 pdb shows
next()
is hanging on the condition variable wait()
call in IMapIterator
, so nobody is waking us up. pdb 显示
next()
挂在IMapIterator
的条件变量wait()
调用上,所以没有人叫醒我们。 Any hints?任何提示? Thanks in advance.
提前致谢。
$ cat /tmp/go3.py
import multiprocessing as mp
print mp.Pool(1).map(abs, range(3))
print list(mp.Pool(1).imap(abs, range(3)))
$ python /tmp/go3.py
[0, 1, 2]
^C^C^C^C^C^\Quit
First notice that this works:首先注意这是有效的:
import multiprocessing as mp
import multiprocessing.util as util
pool=mp.Pool(1)
print list(pool.imap(abs, range(3)))
The difference is that pool
does not get finalized when the call to pool.imap()
ends.不同之处在于
pool
在调用pool.imap()
结束时不会最终确定。
In contrast,相比之下,
print(list(mp.Pool(1).imap(abs, range(3))))
causes the Pool
instance to be finalized soon after the imap
call ends.导致
Pool
实例在imap
调用结束后很快完成。 The lack of a reference causes the Finalizer
(called self._terminate
in the Pool
class) to be called.缺少引用会导致调用
Finalizer
(在Pool
类中称为self._terminate
)。 This sets in motion a sequence of commands which tears down the task handler thread, result handler thread, worker subprocesses, etc.这会启动一系列命令,这些命令会拆除任务处理程序线程、结果处理程序线程、工作子进程等。
This all happens so quickly, that at least on a majority of runs, the task sent to the task handler does not complete.这一切发生得如此之快,以至于至少在大多数运行中,发送到任务处理程序的任务没有完成。
Here are the relevant bits of code:以下是相关的代码位:
From /usr/lib/python2.6/multiprocessing/pool.py:从/usr/lib/python2.6/multiprocessing/pool.py:
class Pool(object):
def __init__(self, processes=None, initializer=None, initargs=()):
...
self._terminate = Finalize(
self, self._terminate_pool,
args=(self._taskqueue, self._inqueue, self._outqueue, self._pool,
self._task_handler, self._result_handler, self._cache),
exitpriority=15
)
/usr/lib/python2.6/multiprocessing/util.py: /usr/lib/python2.6/multiprocessing/util.py:
class Finalize(object):
'''
Class which supports object finalization using weakrefs
'''
def __init__(self, obj, callback, args=(), kwargs=None, exitpriority=None):
...
if obj is not None:
self._weakref = weakref.ref(obj, self)
The weakref.ref(obj,self)
causes self()
to be called when obj
is about to be finalized.当
obj
即将完成时weakref.ref(obj,self)
导致self()
被调用。
I used the debug command util.log_to_stderr(util.SUBDEBUG)
to learn the sequence of events.我使用调试命令
util.log_to_stderr(util.SUBDEBUG)
来了解事件的顺序。 For example:例如:
import multiprocessing as mp
import multiprocessing.util as util
util.log_to_stderr(util.SUBDEBUG)
print(list(mp.Pool(1).imap(abs, range(3))))
yields产量
[DEBUG/MainProcess] created semlock with handle 3077013504
[DEBUG/MainProcess] created semlock with handle 3077009408
[DEBUG/MainProcess] created semlock with handle 3077005312
[DEBUG/MainProcess] created semlock with handle 3077001216
[INFO/PoolWorker-1] child process calling self.run()
[SUBDEBUG/MainProcess] finalizer calling <bound method type._terminate_pool of <class 'multiprocessing.pool.Pool'>> with args (<Queue.Queue instance at 0x9d6e62c>, <multiprocessing.queues.SimpleQueue object at 0x9cf04cc>, <multiprocessing.queues.SimpleQueue object at 0x9d6e40c>, [<Process(PoolWorker-1, started daemon)>], <Thread(Thread-1, started daemon -1217967248)>, <Thread(Thread-2, started daemon -1226359952)>, {0: <multiprocessing.pool.IMapIterator object at 0x9d6eaec>}) and kwargs {}
[DEBUG/MainProcess] finalizing pool
...
and compare that with并将其与
import multiprocessing as mp
import multiprocessing.util as util
util.log_to_stderr(util.SUBDEBUG)
pool=mp.Pool(1)
print list(pool.imap(abs, range(3)))
which yields这产生
[DEBUG/MainProcess] created semlock with handle 3078684672
[DEBUG/MainProcess] created semlock with handle 3078680576
[DEBUG/MainProcess] created semlock with handle 3078676480
[DEBUG/MainProcess] created semlock with handle 3078672384
[INFO/PoolWorker-1] child process calling self.run()
[DEBUG/MainProcess] doing set_length()
[0, 1, 2]
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[SUBDEBUG/MainProcess] calling <Finalize object, callback=_terminate_pool, args=(<Queue.Queue instance at 0xb763e60c>, <multiprocessing.queues.SimpleQueue object at 0xb76c94ac>, <multiprocessing.queues.SimpleQueue object at 0xb763e3ec>, [<Process(PoolWorker-1, started daemon)>], <Thread(Thread-1, started daemon -1218274448)>, <Thread(Thread-2, started daemon -1226667152)>, {}), exitprority=15>
...
[DEBUG/MainProcess] finalizing pool
In my case, I was calling the pool.imap()
without expecting a return value and not getting it to work.就我而言,我在调用
pool.imap()
没有期望返回值,也没有让它工作。 However, if I tried it with pool.map()
it worked fine.但是,如果我用
pool.map()
尝试它,它工作正常。 The issue was exactly as the previous answer stated: there was no finalizer called, so the process was effectively dumped before it was started.问题与之前的答案完全一样:没有调用终结器,因此该过程在开始之前就被有效地转储了。
The solution was to evoke a finalizer such as a list()
function.解决方案是调用终结器,例如
list()
函数。 This caused it to work correctly, since it now requires fulfillment to be handed to the list function, and thus the process was executed.这导致它正常工作,因为它现在需要将履行交给列表函数,因此该过程被执行。 In brief, it is explained below (this is, of course, simplified. For now, just pretend it's something useful):
简而言之,它解释如下(当然,这是简化的。现在,假设它有用):
from multiprocessing import Pool
from shutil import copy
from tqdm import tqdm
filedict = { r"C:\src\file1.txt": r"C:\trg\file1_fixed.txt",
r"C:\src\file2.txt": r"C:\trg\file2_fixed.txt",
r"C:\src\file3.txt": r"C:\trg\file3_fixed.txt",
r"C:\src\file4.txt": r"C:\trg\file4_fixed.txt" }
# target process
def copyfile(srctrg):
copy(srctrg[0],srctrg[1])
return True
# a couple of trial processes for illustration
with Pool(2) as pool:
# works fine with map, but cannot utilize tqdm() since no iterator object is returned
pool.map(copyfile,list(filedict.items()))
# will not work, since no finalizer is called for imap
tqdm(pool.imap(copyfile,list(filedict.items()))) # NOT WORKING
# this works, since the finalization is forced for the process
list(tqdm(pool.imap(copyfile,list(filedict.items()))))
In my case, the simple solution was to enclose the entire tqdm(pool.imap(...))
in a list()
in order to force the execution.就我而言,简单的解决方案是将整个
tqdm(pool.imap(...))
在list()
中以强制执行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.