简体   繁体   English

Python class 方法的多重处理

[英]Python multiprocessing of class method

I want to perform parallel processing of a class method.我想执行 class 方法的并行处理。 This method writes to a dict to save the data.此方法写入字典以保存数据。 Within threads it works, but once the method exits, the dict appears to be untouched.在线程中它可以工作,但是一旦方法退出,dict 似乎没有被触及。 Here is a simplification of my problem:这是我的问题的简化:

from multiprocessing import Pool
class Test():
    def __init__(self):
        self.dict1 = {str(i): i for i in range(1, 10)}
        
    def add_smt(self, arg):
        key, indice = arg
        self.dict1[key] += 1
        print("key: {}, indice: {}, new_value: {}\n".format(key, indice, self.dict1[key]))
        
    def add_smt_multithread(self):
        with Pool() as pool:
            for response in pool.imap(self.add_smt, zip(self.dict1.keys(), range(1, len(self.dict1.keys())))):
                continue
a = Test()
a.add_smt_multithread() 

In this simplification, I want to add +1 for each self.dict1 key using a multiprocessing method.在此简化中,我想使用多处理方法为每个 self.dict1 键添加 +1。 I was surprised that after the a.add_smt_multithread, the a.dict1 remains untouched.令我惊讶的是,在 a.add_smt_multithread 之后,a.dict1 保持不变。 I edited the self.add_smt method with a return and the self.add_smt_multithread into the following working code, but I want to understand the behavior of my first attempt and maybe an easier solution (if you know), as this one is clearer (in my opinion).我将带有返回的 self.add_smt 方法和 self.add_smt_multithread 编辑到以下工作代码中,但我想了解我第一次尝试的行为,也许是一个更简单的解决方案(如果你知道的话),因为这个更清晰(在我的想法)。

from multiprocessing import Pool
class Test():
    def __init__(self):
        self.dict1 = {str(i): i for i in range(1, 10)}
        
    def add_smt(self, arg):
        key, indice = arg
        self.dict1[key] += 1
        print("key: {}, indice: {}, new_value: {}\n".format(key, indice, self.dict1[key]))
        return [key, self.dict1[key]]
        
    def add_smt_multithread(self):
        with Pool() as pool:
            for response in pool.imap(self.add_smt, zip(self.dict1.keys(), range(1, len(self.dict1.keys())))):
                self.dict1[response[0]] = response[1]
a = Test()
a.add_smt_multithread() 

I tried to convert the add_smt into a static method but it wasn`t successfull.我试图将 add_smt 转换为 static 方法,但没有成功。 I want to call just one method of the object to start the multiprocessing.我只想调用 object 的一种方法来启动多处理。 My original code contains a method of http requests and I want to paralize these.我的原始代码包含一个 http 请求的方法,我想将它们并行化。

Your primary issue is a misunderstanding.你的主要问题是误解。 When you say "Within threads it works, but once the method exits, the dict appears to be untouched", you're making an incorrect statement: multiprocessing workers are separate processes , not threads .当您说“在线程内它可以工作,但是一旦方法退出,dict 似乎未被触及”时,您的陈述是错误的: multiprocessing工作者是单独的processes ,而不是threads

The important differences are:重要的区别是:

  1. Threads share a single global memory space (with a special case for thread locals), processes each have their own independent memory space.线程共享一个全局的 memory 空间(线程本地有一个特例),每个进程都有自己独立的 memory 空间。 Processes duplicate (completely when using the 'fork' start method, the default on most UNIX-like OSes, an imperfect simulation thereof on Windows and macOS [the latter when on Python 3.8+ only] where it defaults to the 'spawn' start method) the parent process's state when they are first created, but after that, the memory spaces are independent unless specific IPC mechanisms are used to explicitly share state .进程重复(当使用'fork'启动方法时完全重复,这是大多数类 UNIX 操作系统的默认设置,在 Windows 和 macOS [后者仅在 Python 3.8+ 上] 上的模拟不完美,它默认为'spawn'启动方法) 父进程第一次创建时的 state ,但之后 memory 空间是独立的,除非使用特定的IPC机制显式共享 state

  2. Threads are limited by the GIL (on the CPython reference interpreter, and IIRC, PyPy), a single shared global lock that allows only one thread to be in the bytecode interpreter loop, working with Python-level objects, at a time.线程受GIL (在 CPython 参考解释器和 IIRC、PyPy 上)的限制,这是一个单一的共享全局锁,一次只允许一个线程进入字节码解释器循环,处理 Python 级对象。 For CPU-bound code not using low-level high-performance modules (mostly third party, eg numpy ), this means you can't extract any speed up from threads (in older versions of Python, you'd actually take a pretty nasty performance hit from GIL contention if you used a lot of threads; the new GIL introduced in CPython 3.2 reduces that problem substantially, but it still limits you to getting roughly one core's worth of performance out of threaded Python-level code. If your code is I/O bound, or it's doing large scale number-crunching (eg with 10Kx10K numpy arrays or the like), or you're on a Python interpreter without the GIL (sadly, most such interpreters are wildly out of date, Jython support for Py3 remains not production ready according to their own docs, and IronPython is only 3.4 compatible, seven releases behind CPython), threading may help.对于不使用低级高性能模块(主要是第三方,例如numpy )的 CPU 绑定代码,这意味着您无法从线程中提取任何加速(在 Python 的旧版本中,您实际上会非常讨厌如果您使用大量线程,GIL 争用会影响性能;CPython 3.2 中引入的新 GIL 大大减少了该问题,但它仍然限制您从线程 Python 级代码中获得大约一个核心的性能价值。如果您的代码是I/O 绑定,或者它正在进行大规模的数字运算(例如使用 10Kx10K numpy arrays 等),或者您使用的是没有 GIL 的Python解释器(遗憾的是,大多数此类解释器已经过时了,Jython 支持根据他们自己的文档,Py3 仍未准备好生产,并且 IronPython 仅与 3.4 兼容,比 CPython 晚了七个版本),线程可能会有所帮助。

  3. Processes are limited by IPC costs and limitations;流程受 IPC 成本和局限性限制; with a Pool , every function and its arguments must be pickled (Python's serialization mechanism), sent to the worker processes over a pipe, unpickled on the worker side, run, then the return value must be pickled, sent back to the parent, and unpickled on the parent side.对于Pool ,每个 function 及其 arguments 都必须被 pickle(Python 的序列化机制),通过 pipe 发送到工作进程,在工作端 unpickled,运行,然后返回值必须被 pickle,发送回父进程,并且在父母一方未腌制。 If the objects in question can't be made to be picklable, processes don't work, and if the amount of work being done is too small, relative to the amount of data that must be serialized and transmitted over the pipe, they won't produce any savings.如果有问题的对象不能被 picklable,进程不工作,如果完成的工作量相对于必须序列化和通过 pipe 传输的数据量来说太少,他们赢了'产生任何储蓄。 They're also more memory-hungry on CPython (where the cyclic garbage collector touching reference counts means that, unless you do careful things with gc.freeze , even on a fork based system most of the pages that are mapped into the child as copy-on-write will end up being written and start consuming real memory).它们在 CPython 上也更需要内存(循环垃圾收集器接触引用计数意味着,除非你对gc.freeze做小心的事情,即使在基于fork的系统上,大多数页面都被映射到孩子作为副本-on-write 将最终被写入并开始消耗实际内存)。 This also means that any changes made to the function or arguments will not be seen in the parent process (because the function and arguments received on the worker process side are copies of the parent's state, not aliases).这也意味着对 function 或 arguments 所做的任何更改都不会在父进程中看到(因为在工作进程端收到的 function 和 arguments 是父进程 state 的副本,而不是别名)。

#3 is your big problem here (#1 is a problem in many similar circumstances, but in this case, you're not relying on globals). #3 是你的大问题(#1 在许多类似情况下都是问题,但在这种情况下,你不依赖全局变量)。 When you call pool.imap(self.add_smt, zip(self.dict1.keys(), range(1, len(self.dict1.keys())))) , it's repeatedly pickling self.add_smt (along with each set of arguments for each task), and the pickled form of self.add_smt is the pickled form of self itself, plus the name add_smt and a marker saying it needs to look up the latter on the former.当您调用pool.imap(self.add_smt, zip(self.dict1.keys(), range(1, len(self.dict1.keys()))))时,它会反复酸洗self.add_smt (以及每组每个任务的 arguments), self.add_smt的 pickled 形式是self本身的 pickled 形式,加上名称add_smt和一个标记,表示它需要在前者上查找后者。 Pickling is recursive, so this means you pickle self.dict1 each time, sending it along as part of the task, and a copy of it is realized in each worker. Pickling 是递归的,所以这意味着你每次都 pickle self.dict1 ,将它作为任务的一部分发送,并且它的副本在每个 worker 中实现。 The worker's copy is up-to-date with the parent (assuming no races with threads modifying it), but it's independent; worker 的副本与父副本是最新的(假设没有线程修改它的竞争),但它是独立的; the only state sent back to the parent process is the method's return value, not the updated state of self.add_smt / self / self.dict1 .唯一发送回父进程的 state 是方法的返回值,而不是self.add_smt / self / self.dict1

Your proposed solution "works", but:您提出的解决方案“有效”,但是:

  1. It only works because the counts for any given key are always updated exactly once in the imap .它之所以有效,是因为任何给定键的计数总是在imap中准确更新一次。 If the same key was modified in two of the tasks imap creates (fast enough that the first task hasn't returned and updated the parent process), both would see the same initial value for the key, both would increment once, and the parent would end up with only a single increment taking effect, not both.如果在imap创建的两个任务中修改了同一个键(速度足够快,第一个任务还没有返回并更新父进程),两者都会看到相同的键初始值,都将递增一次,并且父进程最终只会使一个增量生效,而不是同时生效。

  2. It's extremely inefficient;效率极低; self.add_smt (and therefore self and self.dict1 ) are pickled, written to a pipe, and unpickled for every task (not once per imap , once per call to call to self.add_smt that it triggers). self.add_smt (因此selfself.dict1 )被腌制,写入 pipe,并为每个任务取消腌制(不是每个imap一次,每次调用它触发的self.add_smt调用一次)。 That's a lot of data being serialized and round-tripped for very little work done in the workers (the code is almost certainly slower as a result, and not by a small amount).这是大量的数据被序列化和往返,而工人只做了很少的工作(结果几乎可以肯定代码变慢了,而且速度不小)。

The real solution, for I/O-bound code where the GIL isn't a big issue (though you will need to be careful of data races), is usually to get rid of the separate processes and the data copies that come with it.对于 GIL 不是大问题的 I/O 绑定代码(尽管您需要注意数据竞争),真正的解决方案通常是摆脱单独的进程和随之而来的数据副本. You can change your import (s) from using multiprocessing (process-based) to multiprocessing.dummy (a reimplementation of the multiprocessing API backed by threads), and now the threads will all see/use the same shared data.您可以将import从使用multiprocessing (基于进程)更改为multiprocessing.dummy (由线程支持的multiprocessing API 的重新实现),现在所有线程都将看到/使用相同的共享数据。

You do need to add locking to be correct.您确实需要添加锁定才能正确。 I suspect the specifics of CPython's GIL's implementation, and your use of string keys, means it probably works even without explicit locks, but you don't want to rely on that;我怀疑 CPython 的 GIL 实现的细节,以及你对字符串键的使用,意味着即使没有显式锁它也可能工作,但你不想依赖它; even on CPython, the rules for when the GIL can transition have changed over time, so while self.dict1[key] += 1 might be effectively atomic in 3.11 (because GIL hand-offs don't currently occur during the opcodes for loading a value from a dict , incrementing it, and storing it back), that could change in even a patch release (there's no documented rules for when the GIL can't be handed off, except that it must either happen between bytecodes, or due to a C extension explicitly releasing it).即使在 CPython 上,GIL 何时可以转换的规则也随着时间发生了变化,因此虽然self.dict1[key] += 1在 3.11 中可能是有效的原子(因为 GIL 切换当前不会在加载操作码期间发生来自dict的值,递增它,然后将其存储回来),即使是补丁版本也可能发生变化(没有关于 GIL 何时不能移交的记录规则,除了它必须发生在字节码之间,或者由于到明确释放它的 C 扩展)。 So your final code could look something like:所以您的最终代码可能类似于:

from multiprocessing.dummy import Pool, Lock  # Use thread-based Pool/locks, not process-based

class Test():
    def __init__(self):
        self.dict1 = {str(i): i for i in range(1, 10)}
        self._lock = Lock()
        
    def add_smt(self, arg):
        key, indice = arg
        # Protect all access to dict1 behind a lock
        with self._lock:
            self.dict1[key] += 1
            print("key: {}, indice: {}, new_value: {}\n".format(key, indice, self.dict1[key]))

        # Alternate approach that avoids holding lock while printing:
        with self._lock:
            # Cache result of increment to avoid racy lookup in print
            self.dict1[key] = val = self.dict1[key] + 1
        print("key: {}, indice: {}, new_value: {}\n".format(key, indice, val))  # Use cached val
        
    def add_smt_multithread(self):
        with Pool() as pool:
            # Minor tweak: Removed .keys() from all uses; dicts already iterate by key,
            # so making a keys view just wastes time when the dict would iterate the same way
            # I also changed it to use imap_unordered, as you don't care about result
            # ordering, and it saves a lot of work if you don't make Python queue up results
            # to preserve output ordering; imap is the slowest of the map-like calls,
            # because of this work, while map and imap_unordered is extremely efficient,
            # the former due to being able to chunk up work, the latter due to being the
            # the lowest possible job tracking overhead
            # Minor check: Should the range be from 1 to len(self.dict1)+1? As written,
            # you don't map the final key, because the range is one element smaller than the dict
            for response in pool.imap_unordered(self.add_smt,
                                                zip(self.dict1, range(1, len(self.dict1)))):
                pass  # Minor tweak: continue implies skipping, pass means "do nothing at all"

# Lacking this import guard would break your original code even worse on Windows and macOS
# and including it even when processes aren't involved is a good safety measure
if __name__ == '__main__':
    a = Test()
    a.add_smt_multithread()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM