简体繁体 English

在python中利用多个处理器共享数据

[英]shared data utilizing multiple processor in python

原文 2010-01-26 00:51:22 4 3 python/ concurrency/ shared-memory

I have a cpu intensive code which uses a heavy dictionary as data (around 250M data). 我有一个cpu密集的代码，它使用重字典作为数据（大约250M数据）。 I have a multicore processor and want to utilize it so that i can run more than one task at a time. 我有一个多核处理器，并希望利用它，这样我一次可以运行多个任务。 The dictionary is mostly read only and may be updated once a day. 该词典大多是只读的，可能每天更新一次。
How can i write this in python without duplicating the dictionary? 如何在不复制字典的情况下在python中编写这个？
I understand that python threads don't use native threads and will not offer true concurrency. 我知道python线程不使用本机线程，也不会提供真正的并发。 Can i use multiprocessing module without data being serialized between processes? 我可以使用多处理模块而不在进程之间序列化数据吗？

I come from java world and my requirement would be something like java threads which can share data, run on multiple processors and offers synchronization primitives. 我来自java世界，我的要求就像java线程一样，它可以共享数据，在多个处理器上运行并提供同步原语。

3 个解决方案

Use shelve for the dictionary. 使用shelve字典。 Since writes are infrequent there shouldn't be an issue with sharing it. 由于写入很少，因此共享它不应该存在问题。

You can share read-only data among processes simply with a fork (on Unix; no easy way on Windows), but that won't catch the "once a day change" (you'd need to put an explicit way in place for each process to update its own copy). 您可以简单地使用fork （在Unix上，在Windows上没有简单的方法）在进程之间共享只读数据，但这不会捕获“每天一次更改”（您需要采用明确的方法）每个进程更新自己的副本）。 Native Python structures like dict are just not designed to live at arbitrary addresses in shared memory (you'd have to code a dict variant supporting that in C) so they offer no solace. 像dict这样的原生Python结构并不是设计用于共享内存中的任意地址（你必须编写支持C语言的dict变体），因此它们没有提供任何安慰。

You could use Jython (or IronPython) to get a Python implementation with exactly the same multi-threading abilities as Java (or, respectively, C#), including multiple-processor usage by multiple simultaneous threads. 您可以使用Jython（或IronPython）来获得具有与Java（或C＃）完全相同的多线程功能的Python实现，包括多个同时线程的多处理器使用。

在stdlib中查看一下： http ： //docs.python.org/library/multiprocessing.html有许多很棒的功能可以让你很容易地在进程之间共享数据结构。