简体   繁体   中英

Python Multiprocessing apply_async not pickleable?

I am working on computing a large number of functions (approximately 1000000), and since it is very time-consuming, I am using the multiprocessing.Pool.apply_async function. However, when I then try to read the result using the .get() function of the AsyncResult class, I get an error:

File "Test.py", line 17, in <module>
    Test()
  File "Test.py", line 11, in __init__
    self.testList[i].get(5)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 657, in get
    raise self._value
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/pool.py", line 431, in _handle_tasks
    put(task)
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/connection.py", line 206, in send
    self._send_bytes(_ForkingPickler.dumps(obj))
  File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/multiprocessing/reduction.py", line 51, in dumps
    cls(buf, protocol).dump(obj)
TypeError: can't pickle _thread.lock objects

A simplified class that gives the same error:

import multiprocessing as mp
import numpy as np

class Test:
    def __init__(self):
        pool = mp.Pool(processes = 4)
        self.testList = [0,0,0,0]
        for i in range(0,len(self.testList)):
            self.testList[i] = pool.apply_async(self.run, (1,))
        for i in range(0,len(self.testList)):
            self.testList[i].get(5)

    def run(self, i):
        return 1


Test()

Interestingly, if I make self.testList testList instead, the code works fine. However, when I compare the two using .ready() instead of .get(), I find that self.testList is about 1000x faster than testList (something I cannot explain). So, I would really like to find a way to use self.testList.

I've been searching around and although there are other threads about this, they seem to be focused more on Queues than on apply_async. Any help would be appreciated!

Thank you!

Edit: It seems like the initial problem occurred because I was calling mp.Pool inside a class. When I create the same process outside of a class, the program runs, but it is extremely slow (30x slower) compared to the code in the class (I tested this using the .ready() function, which works fine in both cases). Here is a minimal example:

import multiprocessing as mp
import numpy as np
import time

class Test:
    def __init__(self):
        pool = mp.Pool(processes = 4)
        self.testList = [0 for i in range(0,100000)]
        for i in range(0,len(self.testList)):
            self.testList[i] = pool.apply_async(self.run, (1,))
        for i in range(0,len(self.testList)):
            while not self.testList[i].ready():
                continue

    def run(self, i):
        return 1

def functionTest():
    pool = mp.Pool(processes = 4)
    testList = [0 for i in range(0,100000)]
    for i in range(0,len(testList)):
        testList[i] = pool.apply_async(run, (1,))
    for i in range(0,len(testList)):
        while not testList[i].ready():
            continue

def run(i):
    return 1


startTime1 = time.time()
Test()
startTime2 = time.time()
print(startTime2-startTime1)



startTime1 = time.time()
functionTest()
startTime2 = time.time()
print(startTime2-startTime1)

The output of this test is

5.861901044845581
151.7218940258026

I tried looking for ways to get the class approach to work such as taking the multiprocessing out of the init function or feeding the class the pool object instead of having the class create it. Unfortunately, neither of these approaches work. I would really like to find an approach that works and is still fast. Thank thank you for your help!

You're trying to pickle the whole class when you spawn multiple threads, which contains values from mp.Pool set in init . Copying the mp.Pool both doesn't work and doesn't really make sense here. Split your class into two separate top-level functions instead, or at least move the multiprocessing stuff into its own function, outside of the Test class.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM