简体   繁体   English

Python 多处理 apply_async 不可pickle?

[英]Python Multiprocessing apply_async not pickleable?

I am working on computing a large number of functions (approximately 1000000), and since it is very time-consuming, I am using the multiprocessing.Pool.apply_async function.我正在计算大量函数(大约 1000000 个),并且由于它非常耗时,我正在使用 multiprocessing.Pool.apply_async 函数。 However, when I then try to read the result using the .get() function of the AsyncResult class, I get an error:但是,当我尝试使用 AsyncResult 类的 .get() 函数读取结果时,出现错误:

File "Test.py", line 17, in <module>
    Test()
  File "Test.py", line 11, in __init__
    self.testList[i].get(5)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
    put(task)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.lock objects

A simplified class that gives the same error:给出相同错误的简化类:

import multiprocessing as mp
import numpy as np

class Test:
    def __init__(self):
        pool = mp.Pool(processes = 4)
        self.testList = [0,0,0,0]
        for i in range(0,len(self.testList)):
            self.testList[i] = pool.apply_async(self.run, (1,))
        for i in range(0,len(self.testList)):
            self.testList[i].get(5)

    def run(self, i):
        return 1


Test()

Interestingly, if I make self.testList testList instead, the code works fine.有趣的是,如果我改为使用 self.testList testList,则代码运行良好。 However, when I compare the two using .ready() instead of .get(), I find that self.testList is about 1000x faster than testList (something I cannot explain).但是,当我使用 .ready() 而不是 .get() 比较两者时,我发现 self.testList 比 testList 快大约 1000 倍(我无法解释)。 So, I would really like to find a way to use self.testList.所以,我真的很想找到一种使用 self.testList 的方法。

I've been searching around and although there are other threads about this, they seem to be focused more on Queues than on apply_async.我一直在四处寻找,尽管还有其他线程与此相关,但它们似乎更关注队列而不是 apply_async。 Any help would be appreciated!任何帮助,将不胜感激!

Thank you!谢谢!

Edit: It seems like the initial problem occurred because I was calling mp.Pool inside a class.编辑:最初的问题似乎是因为我在类中调用 mp.Pool 。 When I create the same process outside of a class, the program runs, but it is extremely slow (30x slower) compared to the code in the class (I tested this using the .ready() function, which works fine in both cases).当我在类之外创建相同的进程时,程序会运行,但与类中的代码相比它非常慢(慢 30 倍)(我使用 .ready() 函数对此进行了测试,在两种情况下都可以正常工作) . Here is a minimal example:这是一个最小的例子:

import multiprocessing as mp
import numpy as np
import time

class Test:
    def __init__(self):
        pool = mp.Pool(processes = 4)
        self.testList = [0 for i in range(0,100000)]
        for i in range(0,len(self.testList)):
            self.testList[i] = pool.apply_async(self.run, (1,))
        for i in range(0,len(self.testList)):
            while not self.testList[i].ready():
                continue

    def run(self, i):
        return 1

def functionTest():
    pool = mp.Pool(processes = 4)
    testList = [0 for i in range(0,100000)]
    for i in range(0,len(testList)):
        testList[i] = pool.apply_async(run, (1,))
    for i in range(0,len(testList)):
        while not testList[i].ready():
            continue

def run(i):
    return 1


startTime1 = time.time()
Test()
startTime2 = time.time()
print(startTime2-startTime1)



startTime1 = time.time()
functionTest()
startTime2 = time.time()
print(startTime2-startTime1)

The output of this test is这个测试的输出是

5.861901044845581
151.7218940258026

I tried looking for ways to get the class approach to work such as taking the multiprocessing out of the init function or feeding the class the pool object instead of having the class create it.我尝试寻找使类方法起作用的方法,例如从init函数中取出多处理或将池对象提供给类而不是让类创建它。 Unfortunately, neither of these approaches work.不幸的是,这两种方法都不起作用。 I would really like to find an approach that works and is still fast.我真的很想找到一种有效且速度仍然很快的方法。 Thank thank you for your help!谢谢你的帮助!

You're trying to pickle the whole class when you spawn multiple threads, which contains values from mp.Pool set in init .当您生成多个线程时,您正试图对整个类进行腌制,其中包含mp.Pool设置的init值。 Copying the mp.Pool both doesn't work and doesn't really make sense here.复制mp.Pool两者都不起作用,在这里也没有任何意义。 Split your class into two separate top-level functions instead, or at least move the multiprocessing stuff into its own function, outside of the Test class.将您的类拆分为两个单独的顶级函数,或者至少将多处理内容移动到 Test 类之外的自己的函数中。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM