简体   繁体   中英

Python multiprocess with class instance

I have a question that is not really related to a problem I have but rather to why it is not a problem. Perhaps is a bit dumb, but I am not super familiar with classes and I'm trying to learn. Let's say I have a class defined as follows:

import numpy as np
import multiprocessing as mp


class Foo(object):
    def __init__(self, a):
        self.a = a

    def Sum(self, b):
        self.a = np.random.randint(10)
        return self.a + b, self.a

and I create an object:

foo = Foo(1)

then I want to compute the result of Sum for different values of b, in parallel between different processes:

def Calc(b):
    return foo.Sum(b)

pool = mp.Pool(processes=2)
b = [0, 1, 2, 3]
out = pool.map(Calc, b)
print(out)

which prints (in one case as it is random):

[(8, 8), (5, 4), (3, 1), (7, 4)]

which is correct. My question is how can the different processes modify a class attribute, a in our case, at the same time (in this example the operation is quite quick, but in my real world example the operation takes several seconds if not minutes, hence the parallelization) without affecting each other?

Each process is self contained and there is no communication between them. When you send the foo object to different processes they are no longer the same thing - there are many of them doing there own thing. Your question isn't really about classes or class instances but about what happens in different processes.

Printing the id of the instance along with its a attribute can illustrate.

import multiprocessing as mp
import numpy as np

class Foo(object):
    def __init__(self, a):
        self.a = a
    def Sum(self, b):
        s = f'I am {id(self)}, a before={self.a}'
        self.a = np.random.randint(10)
        print(f'{s} | a after={self.a}')
        return self.a + b, self.a

foo = Foo(1)

def Calc(b):
    return foo.Sum(b)

if __name__ == '__main__':

    print(f'original foo id:{id(foo)}')

    pool = mp.Pool(processes=2)
    b = [0, 1, 2, 3, 5, 6, 7, 8]
    out = pool.map(Calc, b)
    print(out)
    print(f'{id(foo)}.a is still {foo.a}') 
    # not sure why this is necessary
    pool.terminate()

Then running from a command prompt:

PS C:\pyprojects> py -m tmp
original foo id:2235026702928
I am 1850261105632, a before=1 | a after=4
I am 1905926138848, a before=1 | a after=1
I am 1850261105632, a before=4 | a after=8
I am 1905926138848, a before=1 | a after=9
I am 1850261105632, a before=8 | a after=2
I am 1905926138848, a before=9 | a after=9
I am 1850261105632, a before=2 | a after=7
I am 1905926138848, a before=9 | a after=3
[(4, 4), (2, 1), (10, 8), (12, 9), (7, 2), (15, 9), (14, 7), (11, 3)]
2235026702928.a is still 1

Playing with print strings:

import multiprocessing as mp
import numpy as np
import os

class Foo(object):
    def __init__(self, a):
        self.a = a
    def Sum(self, b):
        s = f'I am {id(self)}, a: before={self.a}'
        self.a = np.random.randint(10)
        s = f'{s} | after={self.a}'
        return os.getpid(),s,(self.a + b, self.a),b

foo = Foo(1)

def Calc(b):
    return foo.Sum(b)

if __name__ == '__main__':

    print(f'original foo id:{id(foo)}')

    pool = mp.Pool(processes=2)
    b = [0, 1, 2, 3, 5, 6, 7, 8]
    out = pool.map(Calc, b)
    out.sort(key=lambda x: (x[0],x[-1]))
    for result in out:
        print(f'pid:{result[0]} b:{result[-1]} {result[1]} {result[2]}')
    print(f'{id(foo)}.a is still {foo.a}')
    pool.terminate()

...

PS C:\pyprojects> py -m tmp
original foo id:2466513417648
pid:10460 b:1 I am 2729330535728, a: before=1 | after=2 (3, 2)
pid:10460 b:3 I am 2729330535728, a: before=2 | after=5 (8, 5)
pid:10460 b:6 I am 2729330535728, a: before=5 | after=2 (8, 2)
pid:10460 b:8 I am 2729330535728, a: before=2 | after=2 (10, 2)
pid:13100 b:0 I am 2799588470064, a: before=1 | after=1 (1, 1)
pid:13100 b:2 I am 2799588470064, a: before=1 | after=6 (8, 6)
pid:13100 b:5 I am 2799588470064, a: before=6 | after=8 (13, 8)
pid:13100 b:7 I am 2799588470064, a: before=8 | after=0 (7, 0)
2466513417648.a is still 1
PS C:\pyprojects>

Each process works with its own memory, so they cannot modify the class attribute of another process. On the other side if you'll do the same with threads - you'll get problems with race conditions.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM