简体   繁体   English

具有只读共享内存的 Python 中的多处理?

[英]Multiprocessing in Python with read-only shared memory?

I have a single-threaded Python program, and I'd like to modify it to make use of all 32 processors on the server it runs on.我有一个单线程 Python 程序,我想修改它以利用它运行的服务器上的所有 32 个处理器。 As I envision it, each worker process would receive its job from a queue and submit its output to a queue.正如我所设想的那样,每个工作进程都会从队列中接收其作业并将其输出提交到队列中。 To complete its work, however, each worker process would need read-only access to a complex in-memory data structure--many gigabytes of dicts and objects that link to each other.然而,为了完成它的工作,每个工作进程都需要对复杂的内存数据结构进行只读访问——许多千兆字节的字典和相互链接的对象。 In python, is there a simple way to share this data structure, without making a copy of it for each worker process?在python中,有没有一种简单的方法来共享这个数据结构,而不用为每个工作进程制作一个副本?

Thanks.谢谢。

If you are using the CPython (or PyPy) implementation of Python, then the global interpreter lock (GIL) will prevent more than one thread from operating on Python objects at a time.如果您使用 Python 的 CPython(或 PyPy)实现,那么全局解释器锁 (GIL)将阻止多个线程同时操作 Python 对象。

So if you are using such an implementation, you'll need to use multiple processes instead of multiple threads to take advantage of your 32 processors.因此,如果您使用这样的实现,则需要使用多个进程而不是多个线程来利用 32 个处理器。

You could use the the standard library's multiprocessing or concurrent.futures modules to spawn the worker processes.您可以使用标准库的multiprocessingconcurrent.futures模块来生成工作进程。 There are also many third-party options .还有很多第三方选项 Doug Hellman's tutorial is a great introduction to the multiprocessing module. Doug Hellman 的教程很好地介绍了多处理模块。

Since you only need read-only access to the data structure, if you assign the complex data structure to a global variable before you spawn the processes, then all the processes will have access to this global variable.由于您只需要对数据结构进行只读访问,如果您生成进程之前将复杂数据结构分配给全局变量,那么所有进程都可以访问该全局变量。

When you spawn a process, the globals from the calling module are copied to the spawned process.当您生成一个进程时,来自调用模块的全局变量被复制到生成的进程中。 However, on Linux, which has copy-on-write , the very same data structure(s) is used by the spawned processes, so no extra memory is required.但是,在具有copy-on-write 的Linux 上,生成的进程使用完全相同的数据结构,因此不需要额外的内存。 Only when a process modifies the data structure is it copied to a new location.只有当进程修改数据结构时,它才会被复制到新位置。

On Windows, since there is no fork , each spawned process calls python and re-imports the calling module, so each process requires memory for its own separate copy of the huge data structure.在 Windows 上,由于没有fork ,每个生成的进程都会调用 python 并重新导入调用模块,因此每个进程都需要内存来存储自己单独的庞大数据结构副本。 There must be some other way to share data structures on Windows, but I'm unaware of the details.在 Windows 上必须有其他一些共享数据结构的方法,但我不知道细节。 (Edit: POSH may be a solution to the shared-memory problem , but I haven't tried it myself.) (编辑: POSH 可能是共享内存问题的解决方案,但我自己还没有尝试过。)

To add demonstration of unutbu's answer above, here is code showing that it is in fact COW shared memory (CPython 3.6, Mac OS)要添加上面 unutbu 答案的演示,这里的代码显示它实际上是 COW 共享内存(CPython 3.6,Mac OS)

main_shared.py main_shared.py

import multiprocessing
from time import sleep


my_global = None


def test():
    global my_global
    read_only_secs = 3
    while read_only_secs > 0:
        sleep(1)
        print(f'child proc global: {my_global} at {hex(id(my_global))}')
        read_only_secs -= 1
    print('child proc writing to copy-on-write...')
    my_global = 'something else'
    while True:
        sleep(1)
        print(f'child proc global: {my_global} at {hex(id(my_global))}')


def set_func():
    global my_global
    my_global = [{'hi': 1, 'bye': 'foo'}]

if __name__ == "__main__":
    print(f'main proc global: {my_global} at {hex(id(my_global))}')
    set_func()
    print(f'main proc global: {my_global} at {hex(id(my_global))}')
    p1 = multiprocessing.Process(target=test)
    p1.start()

    while True:
        sleep(1)
        print(f'main proc global: {my_global} at {hex(id(my_global))}')

Output输出

$ python main_shared.py 
main proc global: None at 0x101b509f8
main proc global: [{'hi': 1, 'bye': 'foo'}] at 0x102341708
child proc global: [{'hi': 1, 'bye': 'foo'}] at 0x102341708
main proc global: [{'hi': 1, 'bye': 'foo'}] at 0x102341708
child proc global: [{'hi': 1, 'bye': 'foo'}] at 0x102341708
main proc global: [{'hi': 1, 'bye': 'foo'}] at 0x102341708
child proc global: [{'hi': 1, 'bye': 'foo'}] at 0x102341708
child proc writing to copy-on-write...
main proc global: [{'hi': 1, 'bye': 'foo'}] at 0x102341708
child proc global: something else at 0x1022ea3b0
main proc global: [{'hi': 1, 'bye': 'foo'}] at 0x102341708
child proc global: something else at 0x1022ea3b0
main proc global: [{'hi': 1, 'bye': 'foo'}] at 0x102341708
child proc global: something else at 0x1022ea3b0
main proc global: [{'hi': 1, 'bye': 'foo'}] at 0x102341708

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM