简体   繁体   English

多处理中的共享内存

[英]Shared memory in multiprocessing

I have three large lists.我有三个大名单。 First contains bitarrays (module bitarray 0.8.0) and the other two contain arrays of integers.第一个包含位数组(模块 bitarray 0.8.0),另外两个包含整数数组。

l1=[bitarray 1, bitarray 2, ... ,bitarray n]
l2=[array 1, array 2, ... , array n]
l3=[array 1, array 2, ... , array n]

These data structures take quite a bit of RAM (~16GB total).这些数据结构需要相当多的 RAM(总共约 16GB)。

If i start 12 sub-processes using:如果我使用以下方法启动 12 个子流程:

multiprocessing.Process(target=someFunction, args=(l1,l2,l3))

Does this mean that l1, l2 and l3 will be copied for each sub-process or will the sub-processes share these lists?这是否意味着将为每个子进程复制 l1、l2 和 l3,或者子进程将共享这些列表? Or to be more direct, will I use 16GB or 192GB of RAM?或者更直接地说,我会使用 16GB 还是 192GB 的 RAM?

someFunction will read some values from these lists and then performs some calculations based on the values read. someFunction 将从这些列表中读取一些值,然后根据读取的值执行一些计算。 The results will be returned to the parent-process.结果将返回给父进程。 The lists l1, l2 and l3 will not be modified by someFunction.列表 l1、l2 和 l3 不会被 someFunction 修改。

Therefore i would assume that the sub-processes do not need and would not copy these huge lists but would instead just share them with the parent.因此,我会假设子流程不需要也不会复制这些巨大的列表,而只会与父进程共享它们。 Meaning that the program would take 16GB of RAM (regardless of how many sub-processes i start) due to the copy-on-write approach under linux?这意味着由于 linux 下的写时复制方法,该程序将占用 16GB 的 RAM(无论我启动了多少个子进程)? Am i correct or am i missing something that would cause the lists to be copied?我是正确的还是我遗漏了一些会导致列表被复制的东西?

EDIT : I am still confused, after reading a bit more on the subject.编辑:在阅读更多关于该主题的内容后,我仍然感到困惑。 On the one hand Linux uses copy-on-write, which should mean that no data is copied.一方面,Linux 使用写时复制,这意味着不会复制任何数据。 On the other hand, accessing the object will change its ref-count (i am still unsure why and what does that mean).另一方面,访问对象会改变它的引用计数(我仍然不确定为什么以及这意味着什么)。 Even so, will the entire object be copied?即便如此,整个对象会被复制吗?

For example if i define someFunction as follows:例如,如果我定义 someFunction 如下:

def someFunction(list1, list2, list3):
    i=random.randint(0,99999)
    print list1[i], list2[i], list3[i]

Would using this function mean that l1, l2 and l3 will be copied entirely for each sub-process?使用此函数是否意味着将为每个子进程完全复制 l1、l2 和 l3?

Is there a way to check for this?有没有办法检查这个?

EDIT2 After reading a bit more and monitoring total memory usage of the system while sub-processes are running, it seems that entire objects are indeed copied for each sub-process. EDIT2在子进程运行时阅读更多内容并监视系统的总内存使用情况后,似乎确实为每个子进程复制了整个对象。 And it seems to be because reference counting.这似乎是因为引用计数。

The reference counting for l1, l2 and l3 is actually unneeded in my program.在我的程序中实际上不需要 l1、l2 和 l3 的引用计数。 This is because l1, l2 and l3 will be kept in memory (unchanged) until the parent-process exits.这是因为 l1、l2 和 l3 将保存在内存中(不变),直到父进程退出。 There is no need to free the memory used by these lists until then.在此之前无需释放这些列表使用的内存。 In fact i know for sure that the reference count will remain above 0 (for these lists and every object in these lists) until the program exits.事实上,我确信引用计数将保持在 0 以上(对于这些列表和这些列表中的每个对象),直到程序退出。

So now the question becomes, how can i make sure that the objects will not be copied to each sub-process?所以现在问题变成了,我如何确保对象不会被复制到每个子进程? Can i perhaps disable reference counting for these lists and each object in these lists?我可以禁用这些列表和这些列表中的每个对象的引用计数吗?

EDIT3 Just an additional note. EDIT3只是一个附加说明。 Sub-processes do not need to modify l1 , l2 and l3 or any objects in these lists.子流程不需要修改l1l2l3或这些列表中的任何对象。 The sub-processes only need to be able to reference some of these objects without causing the memory to be copied for each sub-process.子进程只需要能够引用其中一些对象,而不会导致为每个子进程复制内存。

Generally speaking, there are two ways to share the same data:一般来说,共享相同的数据有两种方式:

  • Multithreading多线程
  • Shared memory共享内存

Python's multithreading is not suitable for CPU-bound tasks (because of the GIL), so the usual solution in that case is to go on multiprocessing . Python 的多线程不适合 CPU 密集型任务(因为 GIL),所以在这种情况下通常的解决方案是继续multiprocessing However, with this solution you need to explicitly share the data, using multiprocessing.Value and multiprocessing.Array .但是,使用此解决方案,您需要使用multiprocessing.Valuemultiprocessing.Array显式共享数据。

Note that usually sharing data between processes may not be the best choice, because of all the synchronization issues;请注意,由于所有同步问题,通常在进程之间共享数据可能不是最佳选择; an approach involving actors exchanging messages is usually seen as a better choice.参与者交换信息的方法通常被视为更好的选择。 See also Python documentation :另请参阅Python 文档

As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible.如上所述,在进行并发编程时,通常最好尽可能避免使用共享状态。 This is particularly true when using multiple processes.使用多个进程时尤其如此。

However, if you really do need to use some shared data then multiprocessing provides a couple of ways of doing so.但是,如果您确实需要使用一些共享数据,那么多处理提供了几种这样做的方法。

In your case, you need to wrap l1 , l2 and l3 in some way understandable by multiprocessing (eg by using a multiprocessing.Array ), and then pass them as parameters.在您的情况下,您需要以multiprocessing可以理解的某种方式包装l1l2l3 (例如,通过使用multiprocessing.Array ),然后将它们作为参数传递。
Note also that, as you said you do not need write access, then you should pass lock=False while creating the objects, or all access will be still serialized.另请注意,正如您所说,您不需要写访问权限,那么您应该在创建对象时传递lock=False ,否则所有访问权限仍将被序列化。

Because this is still a very high result on google and no one else has mentioned it yet, I thought I would mention the new possibility of 'true' shared memory which was introduced in python version 3.8.0: https://docs.python.org/3/library/multiprocessing.shared_memory.html因为这在 google 上仍然是一个非常高的结果,而且还没有其他人提到它,我想我会提到在 python 3.8.0 版中引入的“真实”共享内存的新可能性: https://docs.python .org/3/library/multiprocessing.shared_memory.html

I have here included a small contrived example (tested on linux) where numpy arrays are used, which is likely a very common use case:我在这里包含了一个使用 numpy 数组的小型人为示例(在 linux 上测试),这可能是一个非常常见的用例:

# one dimension of the 2d array which is shared
dim = 5000

import numpy as np
from multiprocessing import shared_memory, Process, Lock
from multiprocessing import cpu_count, current_process
import time

lock = Lock()

def add_one(shr_name):

    existing_shm = shared_memory.SharedMemory(name=shr_name)
    np_array = np.ndarray((dim, dim,), dtype=np.int64, buffer=existing_shm.buf)
    lock.acquire()
    np_array[:] = np_array[0] + 1
    lock.release()
    time.sleep(10) # pause, to see the memory usage in top
    print('added one')
    existing_shm.close()

def create_shared_block():

    a = np.ones(shape=(dim, dim), dtype=np.int64)  # Start with an existing NumPy array

    shm = shared_memory.SharedMemory(create=True, size=a.nbytes)
    # # Now create a NumPy array backed by shared memory
    np_array = np.ndarray(a.shape, dtype=np.int64, buffer=shm.buf)
    np_array[:] = a[:]  # Copy the original data into shared memory
    return shm, np_array

if current_process().name == "MainProcess":
    print("creating shared block")
    shr, np_array = create_shared_block()

    processes = []
    for i in range(cpu_count()):
        _process = Process(target=add_one, args=(shr.name,))
        processes.append(_process)
        _process.start()

    for _process in processes:
        _process.join()

    print("Final array")
    print(np_array[:10])
    print(np_array[10:])

    shr.close()
    shr.unlink()

Note that because of the 64 bit ints this code can take about 1gb of ram to run, so make sure that you won't freeze your system using it.请注意,由于 64 位整数,此代码可能需要大约 1gb 的内存才能运行,因此请确保使用它时不会冻结系统。 ^_^ ^_^

If you want to make use of copy-on-write feature and your data is static(unchanged in child processes) - you should make python don't mess with memory blocks where your data lies.如果您想使用写时复制功能并且您的数据是静态的(在子进程中未更改) - 您应该让 python 不要弄乱数据所在的内存块。 You can easily do this by using C or C++ structures (stl for instance) as containers and provide your own python wrappers that will use pointers to data memory (or possibly copy data mem) when python-level object will be created if any at all.您可以通过使用 C 或 C++ 结构(例如 stl)作为容器轻松实现这一点,并提供您自己的 Python 包装器,当将创建 Python 级对象(如果有的话)时,该包装器将使用指向数据内存(或可能复制数据内存)的指针. All this can be done very easy with almost python simplicity and syntax with cython .所有这一切都可以通过几乎 python 的简单性和cython 的语法轻松完成

# pseudo cython
cdef class FooContainer:
   cdef char * data
   def __cinit__(self, char * foo_value):
       self.data = malloc(1024, sizeof(char))
       memcpy(self.data, foo_value, min(1024, len(foo_value)))
   
   def get(self):
       return self.data
# python part
from foo import FooContainer

f = FooContainer("hello world")
pid = fork()
if not pid:
   f.get() # this call will read same memory page to where
           # parent process wrote 1024 chars of self.data
           # and cython will automatically create a new python string
           # object from it and return to caller

The above pseudo-code is badly written.上面的伪代码写得不好。 Dont use it.不要使用它。 In place of self.data should be C or C++ container in your case.在您的情况下,代替 self.data 应该是 C 或 C++ 容器。

For those interested in using Python3.8 's shared_memory module, it still has a bug which hasn't been fixed and is affecting Python3.8/3.9/3.10 by now (2021-01-15).对于那些有兴趣使用 Python3.8 的shared_memory模块的人,它仍然有一个尚未修复的错误,并且目前正在影响 Python3.8/3.9/3.10 (2021-01-15)。 The bug affects posix systems and is about resource tracker destroys shared memory segments when other processes should still have valid access.该错误影响 posix 系统,并且与资源跟踪器在其他进程仍应具有有效访问权限时破坏共享内存段有关。 So take care if you use it in your code.因此,如果您在代码中使用它,请务必小心。

您可以使用 memcached 或 redis 并将每个设置为键值对 {'l1'...

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM