简体   繁体   中英

Does multiprocess in python re-initialize globals?

I have a multiprocessing program where I'm unable to work with global variables. I have a program which starts like this:-

from multiprocessing import Process ,Pool
print ("Initializing")
someList = []
...
...
... 

Which means I have someList variables which get initialized before my main is called.

Later on in the code someList is set to some value and then I create 4 processes to process it

pool = Pool(4)
combinedResult = pool.map(processFn, someList)
pool.close()
pool.join()

Before spawning the processes, someList is set to a valid value.

However, when the processes are spawned, I see this print 4 times !! Initializing Initializing Initializing Initializing

As it is clear in each process the initialization section at the top of the program is getting called. Also, someList gets set to empty. If my understanding is correct, each process should be a replica of the current process's state which essentially means, I should have got 4 copies of the same list. Why are the globals being re-initialized again? And in fact, why is that section even being run?

Can someone please explain this to me? I referred to python docs but wasn't able to determine the root cause. They do recommend against using globals and I'm aware of it, but it still doesn't explain the call to the initialization function. Also, I'd like to use multiprocessing and not multithreading. I'm trying to understand how multiprocessing works here.

Thanks for your time.

In Windows processes are not forked as in Linux/Unix. Instead they are spawned , which means that a new Python interpreter is started for each new multiprocessing.Process . This means that all global variables are re-initialized and if you have somehow manipulated them along the way, this will not be seen by the spawned processes.

A solution to the problem is to pass the globals to the Pool initilaizer and then from there make it global also in the spawned process:

from multiprocessing import Pool

def init_pool(the_list):
    global some_list
    some_list = the_list

def access_some_list(index):
    return some_list[index]

if __name__ == "__main__":
    some_list = [24, 12, 6, 3]
    indexes = [3, 2, 1, 0]
    pool = Pool(initializer=init_pool, initargs=(some_list,))
    result = pool.map(access_some_list, indexes)
    print(result)

In this setup, you will copy the globals to each new process and they will then be accessible, however, as always, any updates done from there on will not be propagated to any other process. For that you will need something like a proper multiprocessing.Manager .

As an extra comment, from here it is clear that global variables can be dangerous, because it is hard to understand what values they will take in the different processes.

I think the point is, that you are creating 4 Processes, which are executing the Code you give them. They work in the same instance, but executing the same Code.

So maybe, you do Multithreading or you use some if-clauses etc. to determine which Process should execute which Code.

  • cheers

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM