python中的多处理模块和修改共享全局变量

Question

I have written a small python program to see if I understand how global variables are transmitted to "child" processes.我写了一个小的python程序，看看我是否理解全局变量是如何传输到“子”进程的。

import time
import random

shared_var = range(12)

def f(x):
    global shared_var
    time.sleep(1+random.random())
    shared_var[x] = 100
    print x, multiprocessing.current_process(), shared_var
    return x*x

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    results = pool.map(f, range(8))
    print results
    print shared_var

When I run it I get当我运行它时，我得到

3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11]
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11]
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11]
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11]
6 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 100, 7, 8, 9, 10, 11]
7 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 100, 8, 9, 10, 11]
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]

This is logical, since the child processes modify the global variable and, hence the copy-on-write mechanism makes that when a child process modifies a global variable, it is copied and hence any change is only visible in the spawned process.这是合乎逻辑的，因为子进程修改全局变量，因此写时复制机制使得当子进程修改全局变量时，它被复制，因此任何更改仅在生成的进程中可见。

My surprise was when I modified the code to print the identifiers of the variables:令我惊讶的是，当我修改代码以打印变量的标识符时：

import multiprocessing
import time
import random

shared_var = range(12)

def f(x):
    global shared_var
    time.sleep(1+random.random())
    shared_var[x] = 100
    print x, multiprocessing.current_process(), shared_var, id(shared_var)
    return x*x

if __name__ == '__main__':
    pool = multiprocessing.Pool(4)
    results = pool.map(f, range(8))
    print results
    print shared_var, id(shared_var)

And got:并得到：

3 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
0 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
1 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
2 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968
6 <Process(PoolWorker-2, started daemon)> [0, 100, 2, 3, 4, 5, 100, 7, 8, 9, 10, 11] 4504973968
7 <Process(PoolWorker-3, started daemon)> [0, 1, 100, 3, 4, 5, 6, 100, 8, 9, 10, 11] 4504973968
4 <Process(PoolWorker-4, started daemon)> [0, 1, 2, 100, 100, 5, 6, 7, 8, 9, 10, 11] 4504973968
5 <Process(PoolWorker-1, started daemon)> [100, 1, 2, 3, 4, 100, 6, 7, 8, 9, 10, 11] 4504973968
[0, 1, 4, 9, 16, 25, 36, 49]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11] 4504973968

The identifiers of all the variables (in the main thread and in the spawned processes) are the same, while I expected a copy for each of the processes...所有变量的标识符（在主线程和生成的进程中）都是相同的，而我希望每个进程都有一个副本......

Does anyone know why I got these results?有谁知道我为什么得到这些结果？ Also some references to how multiprocessing deals with global variables being read/written by created Process es would be great.还有一些关于multiprocessing如何multiprocessing由 created Process es 读取/写入的全局变量的参考会很棒。 Thanks!谢谢！

Answer 1

I think there's some confusion about the memory.我认为内存有些混乱。 You don't use multithreading, but multiprocessing, so each worker runs in a separate process, having its own virtual memory space.您不使用多线程，而是使用多处理，因此每个工作程序都在单独的进程中运行，拥有自己的虚拟内存空间。 Therefore, each process has an own copy of shared_var from the very beginning.因此，每个进程从一开始就有自己的shared_var副本。 This is what gets modified in each call to f(x) , leaving the actual variable in __main__ unaffected.这是在每次调用f(x)时修改的内容，使__main__的实际变量不受影响。

You can check the docs for the chapter on sharing memory between processes eg using multiprocessing.Array .您可以查看有关在进程之间共享内存的章节的文档，例如使用multiprocessing.Array 。

I'm not 100% sure why the address stays the same, but I think that since each new subprocess is spawned by forking the main process and copying its memory layout, the addresses in the virtual memory remain the same for each of the children.我不是 100% 确定为什么地址保持不变，但我认为由于每个新子进程都是通过分叉主进程并复制其内存布局产生的，因此虚拟内存中的地址对于每个子进程都保持不变。 The physical memory address is of course different.物理内存地址当然不同。 That's why you see the same id , but different values.这就是为什么您会看到相同的id ，但值不同。

Answer 2

As you may know the id(x) in CPython is actually accessing the memory address of an object.您可能知道 CPython 中的id(x)实际上是访问对象的内存地址。

Pleace check https://superuser.com/questions/347765/is-virtual-memory-related-to-virtual-address-space-of-a-process and Why Virtual Memory Address is the same in different process?请检查https://superuser.com/questions/347765/is-virtual-memory-related-to-virtual-address-space-of-a-process以及为什么虚拟内存地址在不同的进程中是相同的？ . . Basically n operating system arranges virtual memory address to each of the process, the process has no idea about the actual (physical) memory address of an object.基本上 n 操作系统为每个进程安排虚拟内存地址，进程不知道对象的实际（物理）内存地址。

python中的多处理模块和修改共享全局变量

问题描述

2 个解决方案

解决方案1
1 2017-12-23 13:23:15

解决方案2
0 2020-08-25 02:35:55

python中的多处理模块和修改共享全局变量

问题描述

2 个解决方案

解决方案1 1 2017-12-23 13:23:15

解决方案2 0 2020-08-25 02:35:55

解决方案1
1 2017-12-23 13:23:15

解决方案2
0 2020-08-25 02:35:55