简体   繁体   English

多处理 Pool.imap 坏了?

[英]multiprocessing Pool.imap broken?

I've tried both the multiprocessing included in the python2.6 Ubuntu package ( __version__ says 0.70a1) and the latest from PyPI (2.6.2.1).我已经尝试了包含在 python2.6 Ubuntu 包中的多处理( __version__说 0.70a1)和来自 PyPI (2.6.2.1) 的最新版本。 In both cases I don't know how to use imap correctly - it causes the entire interpreter to stop responding to ctrl-C's (map works fine though).在这两种情况下,我都不知道如何正确使用 imap - 它会导致整个解释器停止响应 ctrl-C(尽管 map 工作正常)。 pdb shows next() is hanging on the condition variable wait() call in IMapIterator , so nobody is waking us up. pdb 显示next()挂在IMapIterator的条件变量wait()调用上,所以没有人叫醒我们。 Any hints?任何提示? Thanks in advance.提前致谢。

$ cat /tmp/go3.py
import multiprocessing as mp
print mp.Pool(1).map(abs, range(3))
print list(mp.Pool(1).imap(abs, range(3)))

$ python /tmp/go3.py
[0, 1, 2]
^C^C^C^C^C^\Quit

First notice that this works:首先注意这是有效的:

import multiprocessing as mp
import multiprocessing.util as util
pool=mp.Pool(1)
print list(pool.imap(abs, range(3)))

The difference is that pool does not get finalized when the call to pool.imap() ends.不同之处在于pool在调用pool.imap()结束时不会最终确定。

In contrast,相比之下,

print(list(mp.Pool(1).imap(abs, range(3))))

causes the Pool instance to be finalized soon after the imap call ends.导致Pool实例在imap调用结束后很快完成。 The lack of a reference causes the Finalizer (called self._terminate in the Pool class) to be called.缺少引用会导致调用Finalizer (在Pool类中称为self._terminate )。 This sets in motion a sequence of commands which tears down the task handler thread, result handler thread, worker subprocesses, etc.这会启动一系列命令,这些命令会拆除任务处理程序线程、结果处理程序线程、工作子进程等。

This all happens so quickly, that at least on a majority of runs, the task sent to the task handler does not complete.这一切发生得如此之快,以至于至少在大多数运行中,发送到任务处理程序的任务没有完成。

Here are the relevant bits of code:以下是相关的代码位:

From /usr/lib/python2.6/multiprocessing/pool.py:从/usr/lib/python2.6/multiprocessing/pool.py:

class Pool(object):
    def __init__(self, processes=None, initializer=None, initargs=()):
        ...
        self._terminate = Finalize(
            self, self._terminate_pool,
            args=(self._taskqueue, self._inqueue, self._outqueue, self._pool,
                  self._task_handler, self._result_handler, self._cache),
            exitpriority=15
            )

/usr/lib/python2.6/multiprocessing/util.py: /usr/lib/python2.6/multiprocessing/util.py:

class Finalize(object):
    '''
    Class which supports object finalization using weakrefs
    '''
    def __init__(self, obj, callback, args=(), kwargs=None, exitpriority=None):
        ...
        if obj is not None:
            self._weakref = weakref.ref(obj, self)   

The weakref.ref(obj,self) causes self() to be called when obj is about to be finalized.obj即将完成时weakref.ref(obj,self)导致self()被调用。

I used the debug command util.log_to_stderr(util.SUBDEBUG) to learn the sequence of events.我使用调试命令util.log_to_stderr(util.SUBDEBUG)来了解事件的顺序。 For example:例如:

import multiprocessing as mp
import multiprocessing.util as util
util.log_to_stderr(util.SUBDEBUG)

print(list(mp.Pool(1).imap(abs, range(3))))

yields产量

[DEBUG/MainProcess] created semlock with handle 3077013504
[DEBUG/MainProcess] created semlock with handle 3077009408
[DEBUG/MainProcess] created semlock with handle 3077005312
[DEBUG/MainProcess] created semlock with handle 3077001216
[INFO/PoolWorker-1] child process calling self.run()
[SUBDEBUG/MainProcess] finalizer calling <bound method type._terminate_pool of <class 'multiprocessing.pool.Pool'>> with args (<Queue.Queue instance at 0x9d6e62c>, <multiprocessing.queues.SimpleQueue object at 0x9cf04cc>, <multiprocessing.queues.SimpleQueue object at 0x9d6e40c>, [<Process(PoolWorker-1, started daemon)>], <Thread(Thread-1, started daemon -1217967248)>, <Thread(Thread-2, started daemon -1226359952)>, {0: <multiprocessing.pool.IMapIterator object at 0x9d6eaec>}) and kwargs {}
[DEBUG/MainProcess] finalizing pool
...

and compare that with并将其与

import multiprocessing as mp
import multiprocessing.util as util
util.log_to_stderr(util.SUBDEBUG)
pool=mp.Pool(1)
print list(pool.imap(abs, range(3)))

which yields这产生

[DEBUG/MainProcess] created semlock with handle 3078684672
[DEBUG/MainProcess] created semlock with handle 3078680576
[DEBUG/MainProcess] created semlock with handle 3078676480
[DEBUG/MainProcess] created semlock with handle 3078672384
[INFO/PoolWorker-1] child process calling self.run()
[DEBUG/MainProcess] doing set_length()
[0, 1, 2]
[INFO/MainProcess] process shutting down
[DEBUG/MainProcess] running all "atexit" finalizers with priority >= 0
[SUBDEBUG/MainProcess] calling <Finalize object, callback=_terminate_pool, args=(<Queue.Queue instance at 0xb763e60c>, <multiprocessing.queues.SimpleQueue object at 0xb76c94ac>, <multiprocessing.queues.SimpleQueue object at 0xb763e3ec>, [<Process(PoolWorker-1, started daemon)>], <Thread(Thread-1, started daemon -1218274448)>, <Thread(Thread-2, started daemon -1226667152)>, {}), exitprority=15>
...
[DEBUG/MainProcess] finalizing pool

In my case, I was calling the pool.imap() without expecting a return value and not getting it to work.就我而言,我在调用pool.imap()没有期望返回值,也没有让它工作。 However, if I tried it with pool.map() it worked fine.但是,如果我用pool.map()尝试它,它工作正常。 The issue was exactly as the previous answer stated: there was no finalizer called, so the process was effectively dumped before it was started.问题与之前的答案完全一样:没有调用终结器,因此该过程在开始之前就被有效地转储了。

The solution was to evoke a finalizer such as a list() function.解决方案是调用终结器,例如list()函数。 This caused it to work correctly, since it now requires fulfillment to be handed to the list function, and thus the process was executed.这导致它正常工作,因为它现在需要将履行交给列表函数,因此该过程被执行。 In brief, it is explained below (this is, of course, simplified. For now, just pretend it's something useful):简而言之,它解释如下(当然,这是简化的。现在,假设它有用):

from multiprocessing import Pool
from shutil import copy
from tqdm import tqdm

filedict = { r"C:\src\file1.txt": r"C:\trg\file1_fixed.txt",
             r"C:\src\file2.txt": r"C:\trg\file2_fixed.txt",
             r"C:\src\file3.txt": r"C:\trg\file3_fixed.txt",
             r"C:\src\file4.txt": r"C:\trg\file4_fixed.txt" }

# target process
def copyfile(srctrg):  
    copy(srctrg[0],srctrg[1])
    return True

# a couple of trial processes for illustration
with Pool(2) as pool:

    # works fine with map, but cannot utilize tqdm() since no iterator object is returned 
    pool.map(copyfile,list(filedict.items()))

    # will not work, since no finalizer is called for imap
    tqdm(pool.imap(copyfile,list(filedict.items())))    # NOT WORKING

    # this works, since the finalization is forced for the process
    list(tqdm(pool.imap(copyfile,list(filedict.items()))))

In my case, the simple solution was to enclose the entire tqdm(pool.imap(...)) in a list() in order to force the execution.就我而言,简单的解决方案是将整个tqdm(pool.imap(...))list()中以强制执行。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Python 中的多处理:有没有办法在不累积内存的情况下使用 pool.imap? - Multiprocessing in Python: Is there a way to use pool.imap without accumulating memory? 我可以在 Pool.imap 调用的 function 中使用多处理队列吗? - Can I use a multiprocessing Queue in a function called by Pool.imap? 如何通过 pool.imap() function 传递数据以外的参数以在 python 中进行多处理? - how to pass parameters other than data through pool.imap() function for multiprocessing in python? Python多重处理Pool.imap引发ValueError:list.remove(x):x不在列表中 - Python multiprocessing Pool.imap throws ValueError: list.remove(x): x not in list Python 多处理 - 我们可以将 (itertools.islice) 可迭代对象直接传递给 pool.imap 而无需转换为列表吗? - Python Multiprocessing - Can we pass an (itertools.islice) iterable directly to pool.imap whithout converting to a list? python中的多处理[破池进程] - Multiprocessing in python [broken pool process] 如何将 args 元组传递给多处理池 imap? - How to pass tuple of args to multiprocessing pool imap? 带有Python的多处理池imap的KeyboardInterrupts - KeyboardInterrupts with python's multiprocessing Pool imap 使用 pool.imap 时无法腌制 psycopg2.extensions.connection 对象,但可以在单个进程中完成 - Can't pickle psycopg2.extensions.connection objects when using pool.imap, but can be done in individual processes 管道破损:使用多处理池中的地图 - Broken pipe: Using map from multiprocessing Pool
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM