简体   繁体   中英

In multi processing, how can the child processes access the global variable from the parent process?

import multiprocessing

# list with global scope
result = [100,200]

def square_list(mylist):
    """
    function to square a given list
    """
    global result
    # append squares of mylist to global list result
    for num in mylist:
        result.append(num * num)
    # print global list result
    print("Result(in process p1): {}".format(result))

if __name__ == "__main__":
    # input list
    mylist = [1,2,3,4]

    # creating new process
    p1 = multiprocessing.Process(target=square_list, args=(mylist,))
    # starting process
    p1.start()
    # wait until process is finished
    p1.join()

    # print global result list
    print("Result(in main program): {}".format(result))

Here, the global variable result can be accessed by the function that is running in a new process. Since the new process has its own python interpreter and its own memory space, how can it access the global variable from the parent process?

Note: I understand about concept of queue/pipe/manager/array/value. This question is specifically to ask how does child process have READ access to global variable from parent process?

As I mentioned in my comment to your question, you should use a managed list that is passed as an additional argument to square_list :

import multiprocessing

def square_list(result, mylist):
    """
    function to square a given list
    """
    # append squares of mylist to global list result
    for num in mylist:
        result.append(num * num)
    # print global list result
    print("Result(in process p1): {}".format(result))

if __name__ == "__main__":
    # input list
    mylist = [1,2,3,4]

    result = multiprocessing.Manager().list([100,200])

    # creating new process
    p1 = multiprocessing.Process(target=square_list, args=(result, mylist))
    # starting process
    p1.start()
    # wait until process is finished
    p1.join()

    # print global result list
    print("Result(in main program): {}".format(result))

Prints:

Result(in process p1): [100, 200, 1, 4, 9, 16]
Result(in main program): [100, 200, 1, 4, 9, 16]

Notes

If your subprocess ("child process") were only reading the result list, then your code would be fine as is. But things get a bit complicated when you want to update the list and have it reflected back to the main process.

There are two ways a subprocess can update an object that has been created by a main process (I will ultimately get to the issue of the object actually being a global variable):

  1. The object can be allocated in shared memory so that both processes are actually accessing the same storage although in general they "live" in separate address spaces.
  2. The object is a "managed" object represented by a reference to a proxy through which all access is made. When an update to the object is made via the proxy, data is actually being transferred from one address space to another using either sockets or named pipes depending on the platform and other considerations. Thus this is more akin to a remote procedure call.

Let's take the case of a simple shared memory object being updated using a global variable. For this I will use a simple multiprocessing.Value instance to create a shared integer:

import multiprocessing

v = multiprocessing.Value('i', 1) # initialize to 1

def worker():
    v.value += 10

if __name__ == "__main__":
    p = multiprocessing.Process(target=worker)
    p.start()
    p.join()

    print(v.value)

On Windows, this prints as 1 and not 11 as you might expect. This is because on Windows new processes are created using the spawn method. This means a new, empty address space is created, a new Python interpreter is launched and the source is re-executed from the top and any code at global scope is executed except code within the if __name__ == "__main__": block since in the newly minted process __name__ will not be "__main__" (and that is a good thing else otherwise you would get into a recursive loop re-creating new subprocesses).

But this means that the subprocess has just created its own instance of global variable v . So for this to work, v cannot be global and must be passed as an argument to worker .

But, there is a way if instead you use a multiprocessing pool. This facility allows you to initialize each process in the pool with a special pool-initializer function:

import multiprocessing

# initialize each process (there is only 1) in the pool
def init_pool(shared_v):
    global v
    v = shared_v # v is global

def worker():
    v.value += 10

if __name__ == "__main__":
    v = multiprocessing.Value('i', 1) # I am global

    # create pool of size 1:
    pool = multiprocessing.Pool(1, initializer=init_pool, initargs=(v,))
    pool.apply(worker)

Prints:

11

Unfortunately, it is a bit of work to implement a list using the shared-memory data types available. That is why I recommended using a managed list:

import multiprocessing

# initialize each process (there is only 1) in the pool
def init_pool(shared_result):
    global result
    result = shared_result # result is global

def square_list(mylist):
    """
    function to square a given list
    """
    # append squares of mylist to global list result
    for num in mylist:
        result.append(num * num)
    # print global list result
    print("Result(in process p1): {}".format(result))

if __name__ == "__main__":
    # input list
    mylist = [1,2,3,4]

    result = multiprocessing.Manager().list([100,200])

    pool = multiprocessing.Pool(1, initializer=init_pool, initargs=(result,))
    pool.apply(square_list, args=(mylist,))

    # print global result list
    print("Result(in main program): {}".format(result))
Result(in process p1): [100, 200, 1, 4, 9, 16]
Result(in main program): [100, 200, 1, 4, 9, 16]

This technique works in Windows, Linux, etc., ie all platforms.

Move your updatable global variables to within the if __name__ == '__main__': block (they are still global to the main process) and use a pool-initializer function to initialize the pool processes with these variables. In fact, for platforms that use spawn , you should consider moving all global definitions that are not required by subprocesses and are expensive to create to within the if __name__ == '__main__': block.

One of the key tenants of processes vs threads is that they don't share memory. There are a few things that exist to actually share memory, but in general with processes, you should pass messages via queues, pipes, etc..

Here is an example of passing a return value back to the parent via a queue:

import multiprocessing

# list with global scope
result = [100,200] #result is re-created on import here in the child process

def square_list(mylist, ret_q):
    """
    function to square a given list
    """
    global result 
    # append squares of mylist to global list result
    for num in mylist:
        result.append(num * num)
    # print global list result
    print("Result(in process p1): {}".format(result))
    ret_q.put(result) #send the modified result to the main process

if __name__ == "__main__":
    # input list
    mylist = [1,2,3,4]

    #return queue
    ret_q = multiprocessing.Queue()

    # creating new process
    p1 = multiprocessing.Process(target=square_list, args=(mylist, ret_q))
    # starting process
    p1.start()
    # wait for the result
    result = ret_q.get()
    # wait until process is finished
    p1.join()

    # print global result list
    print("Result(in main program): {}".format(result))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM