I've been playing around with a Pool
object while using an instance method as the func
argument. It's been a bit surprising with regards to instance state. It seems like the instance gets reset on every chunk. Eg:
import multiprocessing as mp
import logging
class Worker(object):
def __init__(self):
self.consumed = set()
def consume(self, i):
if i not in self.consumed:
logging.info(i)
self.consumed.add(i)
if __name__ == '__main__':
n = 1
logging.basicConfig(level='INFO', format='%(process)d: %(message)s')
worker = Worker()
with mp.Pool(processes=2) as pool:
pool.map(worker.consume, [1] * 100, chunksize=n)
If n
is set to 1, then 1
gets logged every time. if n
is set to 20, it's logged 5 times, etc. What is the reason for this, and is there any way around it? I also wanted to use the initializer
pool argument with an instance method but hit similar issues.
The instance method worker.consume
is passed to the worker processes on a queue. To accomplish this, it must be pickled. For every job, the same pickle string is received, but a new instance is created when that string is unpickled. You can see the gist of what's going on here, without any multiprocessing:
In [1]: import pickle
In [2]: class Thing:
...: def __init__(self):
...: self.called = 0
...: def whoami(self):
...: self.called += 1
...: print("{} called {} times".format(self, self.called))
In [3]: pickled = pickle.dumps(Thing().whoami)
In [4]: pickle.loads(pickled)()
<__main__.Thing object at 0x10a636898> called 1 times
In [5]: pickle.loads(pickled)()
<__main__.Thing object at 0x10a6c6550> called 1 times
In [6]: pickle.loads(pickled)()
<__main__.Thing object at 0x10a6bd940> called 1 times
The id of each Thing
instance is different, and each has its own called
attribute.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.