简体   繁体   中英

Multiprocessing with pool in python: About several instances with same name at the same time

I'm kind of new to multiprocessing. However, assume that we have a program as below. The program seems to work fine. Now to the question. In my opinion we will have 4 instances of SomeKindOfClass with the same name ( a ) at the same time. How is that possible? Moreover, is there a potential risk with this kind of programming?

from multiprocessing.dummy import Pool
import numpy
from theFile import someKindOfClass

n = 8 
allOutputs = numpy.zeros(n)

def work(index):   
    a = SomeKindOfClass()
    a.theSlowFunction()
    allOutputs[index] = a.output

pool = Pool(processes=4) 
pool.map(work,range(0,n))

The name a is only local in scope within your work function, so there is no conflict of names here. Internally python will keep track of each class instance with a unique identifier. If you wanted to check this you could check the object id using the id function:

print(id(a))

I don't see any issues with your code.

Actually, you will have 8 instances of SomeKindOfClass (one for each worker), but only 4 will ever be active at the same time.

multiprocessing vs multiprocessing.dummy

Your program will only work if you continue to use the multiprocessing.dummy module, which is just a wrapper around the threading module. You are still using "python threads" (not separate processes). "Python threads" share the same global state; "Processes" don't. Python threads also share the same GIL, so they're still limited to running one python bytecode statement at a time, unlike processes, which can all run python code simultaneously.

If you were to change your import to from multiprocessing import Pool , you would notice that the allOutputs array remains unchanged after all the workers finish executing (also, you would likely get an error because you're creating the pool in the global scope, you should probably put that inside a main() function). This is because multiprocessing makes a new copy of the entire global state when it makes a new process. When the worker modifies the global allOutputs , it will be modifying a copy of that initial global state. When the process ends, nothing will be returned to the main process and the global state of the main process will remain unchanged.

Sharing State Between Processes

Unlike threads, processes aren't sharing the same memory

If you want to share state between processes, you have to explicitly declare shared variables and pass them to each process, or use pipes or some other method to allow the worker processes to communicate with each other or with the main process.

There are several ways to do this, but perhaps the simplest is using the Manager class

import multiprocessing

def worker(args):
    index, array = args
    a = SomeKindOfClass()
    a.some_expensive_function()
    array[index] = a.output


def main():
    n = 8
    manager = multiprocessing.Manager()
    array = manager.list([0] * n)
    pool = multiprocessing.Pool(4)
    pool.map(worker, [(i, array) for i in range(n)])
    print array

You can declare class instances inside the pool workers, because each instance has a separate place in memory so they don't conflict. The problem is if you declare a class instance first, then try to pass that one instance into multiple pool workers. Then each worker has a pointer to the same place in memory, and it will fail (this can be handled, just not this way).

Basically pool workers must not have overlapping memory anywhere. As long as the workers don't try to share memory somewhere, or perform operations that may result in collisions (like printing to the same file), there shouldn't be any problem.

Make sure whatever they're supposed to do (like something you want printed to a file, or added to a broader namespace somewhere) is returned as a result at the end, which you then iterate through.

If you are using multiprocessing you shouldn't worry - process doesn't share memory (by-default). So, there is no any risk to have several independent objects of class SomeKindOfClass - each of them will live in own process. How it works? Python runs your program and after that it runs 4 child processes. That's why it's very important to have if __init__ == '__main__' construction before pool.map(work,range(0,n)) . Otherwise you will receive a infinity loop of process creation.

Problems could be if SomeKindOfClass keeps state on disk - for example, write something to file or read it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM