简体   繁体   English

带有类实例的 Python 多进程

[英]Python multiprocess with class instance

I have a question that is not really related to a problem I have but rather to why it is not a problem.我有一个问题,它与我遇到的问题并没有真正的关系,而是与为什么它不是问题有关。 Perhaps is a bit dumb, but I am not super familiar with classes and I'm trying to learn.也许有点愚蠢,但我对课程不是很熟悉,我正在努力学习。 Let's say I have a class defined as follows:假设我有一个定义如下的类:

import numpy as np
import multiprocessing as mp


class Foo(object):
    def __init__(self, a):
        self.a = a

    def Sum(self, b):
        self.a = np.random.randint(10)
        return self.a + b, self.a

and I create an object:我创建了一个对象:

foo = Foo(1)

then I want to compute the result of Sum for different values of b, in parallel between different processes:然后我想在不同进程之间并行计算不同 b 值的 Sum 结果:

def Calc(b):
    return foo.Sum(b)

pool = mp.Pool(processes=2)
b = [0, 1, 2, 3]
out = pool.map(Calc, b)
print(out)

which prints (in one case as it is random):打印(在一种情况下是随机的):

[(8, 8), (5, 4), (3, 1), (7, 4)]

which is correct.哪个是正确的。 My question is how can the different processes modify a class attribute, a in our case, at the same time (in this example the operation is quite quick, but in my real world example the operation takes several seconds if not minutes, hence the parallelization) without affecting each other?我的问题是不同的进程如何同时修改一个类属性,在我们的例子中是 a(在这个例子中操作非常快,但在我的真实世界示例中,操作需要几秒钟甚至几分钟,因此并行化) 互不影响?

Each process is self contained and there is no communication between them.每个进程都是独立的,它们之间没有通信。 When you send the foo object to different processes they are no longer the same thing - there are many of them doing there own thing.当您将 foo 对象发送到不同的进程时,它们不再是一回事——它们中的许多人都在做自己的事情。 Your question isn't really about classes or class instances but about what happens in different processes.您的问题实际上不是关于类或类实例,而是关于不同进程中发生的事情。

Printing the id of the instance along with its a attribute can illustrate.打印实例的 id 及其a属性可以说明。

import multiprocessing as mp
import numpy as np

class Foo(object):
    def __init__(self, a):
        self.a = a
    def Sum(self, b):
        s = f'I am {id(self)}, a before={self.a}'
        self.a = np.random.randint(10)
        print(f'{s} | a after={self.a}')
        return self.a + b, self.a

foo = Foo(1)

def Calc(b):
    return foo.Sum(b)

if __name__ == '__main__':

    print(f'original foo id:{id(foo)}')

    pool = mp.Pool(processes=2)
    b = [0, 1, 2, 3, 5, 6, 7, 8]
    out = pool.map(Calc, b)
    print(out)
    print(f'{id(foo)}.a is still {foo.a}') 
    # not sure why this is necessary
    pool.terminate()

Then running from a command prompt:然后从命令提示符运行:

PS C:\pyprojects> py -m tmp
original foo id:2235026702928
I am 1850261105632, a before=1 | a after=4
I am 1905926138848, a before=1 | a after=1
I am 1850261105632, a before=4 | a after=8
I am 1905926138848, a before=1 | a after=9
I am 1850261105632, a before=8 | a after=2
I am 1905926138848, a before=9 | a after=9
I am 1850261105632, a before=2 | a after=7
I am 1905926138848, a before=9 | a after=3
[(4, 4), (2, 1), (10, 8), (12, 9), (7, 2), (15, 9), (14, 7), (11, 3)]
2235026702928.a is still 1

Playing with print strings:使用打印字符串:

import multiprocessing as mp
import numpy as np
import os

class Foo(object):
    def __init__(self, a):
        self.a = a
    def Sum(self, b):
        s = f'I am {id(self)}, a: before={self.a}'
        self.a = np.random.randint(10)
        s = f'{s} | after={self.a}'
        return os.getpid(),s,(self.a + b, self.a),b

foo = Foo(1)

def Calc(b):
    return foo.Sum(b)

if __name__ == '__main__':

    print(f'original foo id:{id(foo)}')

    pool = mp.Pool(processes=2)
    b = [0, 1, 2, 3, 5, 6, 7, 8]
    out = pool.map(Calc, b)
    out.sort(key=lambda x: (x[0],x[-1]))
    for result in out:
        print(f'pid:{result[0]} b:{result[-1]} {result[1]} {result[2]}')
    print(f'{id(foo)}.a is still {foo.a}')
    pool.terminate()

... ...

PS C:\pyprojects> py -m tmp
original foo id:2466513417648
pid:10460 b:1 I am 2729330535728, a: before=1 | after=2 (3, 2)
pid:10460 b:3 I am 2729330535728, a: before=2 | after=5 (8, 5)
pid:10460 b:6 I am 2729330535728, a: before=5 | after=2 (8, 2)
pid:10460 b:8 I am 2729330535728, a: before=2 | after=2 (10, 2)
pid:13100 b:0 I am 2799588470064, a: before=1 | after=1 (1, 1)
pid:13100 b:2 I am 2799588470064, a: before=1 | after=6 (8, 6)
pid:13100 b:5 I am 2799588470064, a: before=6 | after=8 (13, 8)
pid:13100 b:7 I am 2799588470064, a: before=8 | after=0 (7, 0)
2466513417648.a is still 1
PS C:\pyprojects>

Each process works with its own memory, so they cannot modify the class attribute of another process.每个进程都使用自己的内存,因此它们不能修改另一个进程的类属性。 On the other side if you'll do the same with threads - you'll get problems with race conditions.另一方面,如果你对线程做同样的事情——你会遇到竞争条件的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM