[英]Multiprocessing in Python: Is there a way to use pool.imap without accumulating memory?
[英]multiprocessing Pool.imap broken?
我已经尝试了包含在 python2.6 Ubuntu 包中的多处理( __version__
说 0.70a1)和来自 PyPI (2.6.2.1) 的最新版本。 在这两种情况下,我都不知道如何正确使用 imap - 它会导致整个解释器停止响应 ctrl-C(尽管 map 工作正常)。 pdb 显示next()
挂在IMapIterator
的条件变量wait()
调用上,所以没有人叫醒我们。 任何提示? 提前致谢。
$ cat /tmp/go3.py
import multiprocessing as mp
print mp.Pool(1).map(abs, range(3))
print list(mp.Pool(1).imap(abs, range(3)))
$ python /tmp/go3.py
[0, 1, 2]
^C^C^C^C^C^\Quit
首先注意这是有效的:
import multiprocessing as mp
import multiprocessing.util as util
pool=mp.Pool(1)
print list(pool.imap(abs, range(3)))
不同之处在于pool
在调用pool.imap()
结束时不会最终确定。
相比之下,
print(list(mp.Pool(1).imap(abs, range(3))))
导致Pool
实例在imap
调用结束后很快完成。 缺少引用会导致调用Finalizer
(在Pool
类中称为self._terminate
)。 这会启动一系列命令,这些命令会拆除任务处理程序线程、结果处理程序线程、工作子进程等。
这一切发生得如此之快,以至于至少在大多数运行中,发送到任务处理程序的任务没有完成。
以下是相关的代码位:
从/usr/lib/python2.6/multiprocessing/pool.py:
class Pool(object):
def __init__(self, processes=None, initializer=None, initargs=()):
...
self._terminate = Finalize(
self, self._terminate_pool,
args=(self._taskqueue, self._inqueue, self._outqueue, self._pool,
self._task_handler, self._result_handler, self._cache),
exitpriority=15
)
/usr/lib/python2.6/multiprocessing/util.py:
class Finalize(object):
'''
Class which supports object finalization using weakrefs
'''
def __init__(self, obj, callback, args=(), kwargs=None, exitpriority=None):
...
if obj is not None:
self._weakref = weakref.ref(obj, self)
当obj
即将完成时weakref.ref(obj,self)
导致self()
被调用。
我使用调试命令util.log_to_stderr(util.SUBDEBUG)
来了解事件的顺序。 例如:
import multiprocessing as mp
import multiprocessing.util as util
util.log_to_stderr(util.SUBDEBUG)
print(list(mp.Pool(1).imap(abs, range(3))))
产量
[DEBUG/MainProcess] created semlock with handle 3077013504
[DEBUG/MainProcess] created semlock with handle 3077009408
[DEBUG/MainProcess] created semlock with handle 3077005312
[DEBUG/MainProcess] created semlock with handle 3077001216
[INFO/PoolWorker-1] child process calling self.run()
[SUBDEBUG/MainProcess] finalizer calling <bound method type._terminate_pool of <class 'multiprocessing.pool.Pool'>> with args (<Queue.Queue instance at 0x9d6e62c>, <multiprocessing.queues.SimpleQueue object at 0x9cf04cc>, <multiprocessing.queues.SimpleQueue object at 0x9d6e40c>, [<Process(PoolWorker-1, started daemon)>], <Thread(Thread-1, started daemon -1217967248)>, <Thread(Thread-2, started daemon -1226359952)>, {0: <multiprocessing.pool.IMapIterator object at 0x9d6eaec>}) and kwargs {}
[DEBUG/MainProcess] finalizing pool
...
并将其与
import multiprocessing as mp
import multiprocessing.util as util
util.log_to_stderr(util.SUBDEBUG)
pool=mp.Pool(1)
print list(pool.imap(abs, range(3)))
这产生
[DEBUG/MainProcess] created semlock with handle 3078684672
[DEBUG/MainProcess] created semlock with handle 3078680576
[DEBUG/MainProcess] created semlock with handle 3078676480
[DEBUG/MainProcess] created semlock with handle 3078672384
[INFO/PoolWorker-1] child process calling self.run()
[DEBUG/MainProcess] doing set_length()
[0, 1, 2]
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[SUBDEBUG/MainProcess] calling <Finalize object, callback=_terminate_pool, args=(<Queue.Queue instance at 0xb763e60c>, <multiprocessing.queues.SimpleQueue object at 0xb76c94ac>, <multiprocessing.queues.SimpleQueue object at 0xb763e3ec>, [<Process(PoolWorker-1, started daemon)>], <Thread(Thread-1, started daemon -1218274448)>, <Thread(Thread-2, started daemon -1226667152)>, {}), exitprority=15>
...
[DEBUG/MainProcess] finalizing pool
就我而言,我在调用pool.imap()
没有期望返回值,也没有让它工作。 但是,如果我用pool.map()
尝试它,它工作正常。 问题与之前的答案完全一样:没有调用终结器,因此该过程在开始之前就被有效地转储了。
解决方案是调用终结器,例如list()
函数。 这导致它正常工作,因为它现在需要将履行交给列表函数,因此该过程被执行。 简而言之,它解释如下(当然,这是简化的。现在,假设它有用):
from multiprocessing import Pool
from shutil import copy
from tqdm import tqdm
filedict = { r"C:\src\file1.txt": r"C:\trg\file1_fixed.txt",
r"C:\src\file2.txt": r"C:\trg\file2_fixed.txt",
r"C:\src\file3.txt": r"C:\trg\file3_fixed.txt",
r"C:\src\file4.txt": r"C:\trg\file4_fixed.txt" }
# target process
def copyfile(srctrg):
copy(srctrg[0],srctrg[1])
return True
# a couple of trial processes for illustration
with Pool(2) as pool:
# works fine with map, but cannot utilize tqdm() since no iterator object is returned
pool.map(copyfile,list(filedict.items()))
# will not work, since no finalizer is called for imap
tqdm(pool.imap(copyfile,list(filedict.items()))) # NOT WORKING
# this works, since the finalization is forced for the process
list(tqdm(pool.imap(copyfile,list(filedict.items()))))
就我而言,简单的解决方案是将整个tqdm(pool.imap(...))
在list()
中以强制执行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.