简体   繁体   English

multiprocessing.Array 在内存中的分配

[英]multiprocessing.Array allocation in memory

I am creating a multiprocessing.Array and checking its size in memory with htop .我正在创建一个 multiprocessing.Array 并使用htop检查它在内存中的大小。 also, I am looking at how much time it takes to access the last element of the array.此外,我正在查看访问数组的最后一个元素需要多长时间。

import multiprocessing as mp
import numpy as np
import time
X = 1 # 1 GB
mp_array = mp.Array('B', X*1024*1024*1024, lock = True)
np_array = np.frombuffer(mp_array.get_obj(), dtype = 'B')

t1 = time.time()
for i1 in range(10**7):
    a = np_array[-1]
print(time.time() - t1)

when I create a 'small' array, I see in htop how the Memory increases by a GB, the same if the size is 3 or 6 GB, so all the array goes to memory in use (the green one).当我创建一个“小”数组时,我在htop 中看到内存如何增加 1 GB,如果大小为 3 或 6 GB,则相同,因此所有数组都进入正在使用的内存(绿色)。 I have 32 GB on my pc, so I tried creating a 20GB array, but when I do it, I see that the green memory is the same value as before creating the array, and all of it goes to the yellow one (cache, for what I have read).我的电脑上有 32 GB,所以我尝试创建一个 20GB 的数组,但是当我这样做时,我看到绿色内存与创建数组之前的值相同,并且所有这些都转到黄色内存(缓存,我读过的)。 I even tried creating an array bigger than my total RAM, and it worked.我什至尝试创建一个比我的总 RAM 还大的数组,它奏效了。 so I don't really know what is happening in the background (is not using the swap).所以我真的不知道后台发生了什么(不使用交换)。 The times are basically the same in each scenario, taking around 1.5s for the loop每种情况下的时间基本相同,循环大约需要 1.5 秒

Now, this is just testing, for the program I am working on, I create six independent shared memory arrays, of X GB each in a machine with 8 GB.现在,这只是测试,对于我正在处理的程序,我在一台 8 GB 的机器中创建了六个独立的共享内存阵列,每个 X GB。 (Just for context: each array is a buffer for a camera stream) (仅用于上下文:每个数组都是相机流的缓冲区)

I tried with X = 0.5, so should use 3 GB total, and it works as intended, but when I go with X = 1, a fraction of it goes to cache.我尝试使用 X = 0.5,所以总共应该使用 3 GB,它按预期工作,但是当我使用 X = 1 时,它的一小部分会进入缓存。 I guess the first few arrays are allocated in the 'active' memory and the rest into the cache?我猜前几个数组分配在“活动”内存中,其余的分配到缓存中? In my crude testing, I saw no time difference when accessing any type of memory.在我的粗略测试中,访问任何类型的内存时都没有发现时间差异。 but I really don't understand what is happening here nor why I can create an array larger than my total RAM.但我真的不明白这里发生了什么,也不明白为什么我可以创建一个比我的总 RAM 大的数组。

Any insight you can give me?你能给我任何见解吗?

sharedctypes allocates memory from chunks referred to as "Arena"s which are memory-mapped files. sharedctypes从称为“Arena”的块中分配内存,这些块是内存映射文件。 A best effort is made to create said files from non-storage backed directories (part of your fs is only ever in ram...) for best performance.尽最大努力从非存储支持的目录(您的 fs 的一部分仅在 ram 中...)创建所述文件以获得最佳性能。 The available space is checked before it makes the file however, so if there isn't space in the in-memory temp folder, the temp file may be made elsewhere that does reside on disk.但是,在创建文件之前会检查可用空间,因此如果内存中的临时文件夹中没有空间,则可能会在驻留在磁盘上的其他位置创建临时文件。 The buffer is then eventually taken as a memoryview of the mmap:然后缓冲区最终被视为 mmap 的内存视图:

multiprocessing/heap.py多处理/heap.py

else: #if sys.platform != 'win32':

    class Arena(object):
        """
        A shared memory area backed by a temporary file (POSIX).
        """

        if sys.platform == 'linux':
            _dir_candidates = ['/dev/shm']
        else:
            _dir_candidates = []

        def __init__(self, size, fd=-1):
            self.size = size
            self.fd = fd
            if fd == -1:
                # Arena is created anew (if fd != -1, it means we're coming
                # from rebuild_arena() below)
                self.fd, name = tempfile.mkstemp(
                     prefix='pym-%d-'%os.getpid(),
                     dir=self._choose_dir(size))
                os.unlink(name)
                util.Finalize(self, os.close, (self.fd,))
                os.ftruncate(self.fd, size)
            self.buffer = mmap.mmap(self.fd, self.size)

        def _choose_dir(self, size):
            # Choose a non-storage backed directory if possible,
            # to improve performance
            for d in self._dir_candidates:
                st = os.statvfs(d)
                if st.f_bavail * st.f_frsize >= size:  # enough free space?
                    return d
            return util.get_temp_dir()

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 强制在内存而不是磁盘中创建multiprocessing.Array - Force multiprocessing.Array to be created in memory instead of on disk python multiprocessing.Array:巨大的临时内存开销 - python multiprocessing.Array: huge temporary memory overhead 在共享内存中使用 Multiprocessing.Array 时没有剩余空间 - No space left while using Multiprocessing.Array in shared memory 用于Python中multiprocessing.Array的元组 - Tuple for multiprocessing.Array in python 在 Python 中使用 multiprocessing.Array 时出现 OSError (Errno 9) - OSError (Errno 9) when using multiprocessing.Array in Python 如何在 python 中使 multiprocessing.Array 进程安全 - How to make a multiprocessing.Array process safe in python 在multiprocessing.Array中表示任意元素类型时发生TypeError - TypeError while representing arbitrary element type in multiprocessing.Array 在Python中将multiprocessing.Array转换为ctypes.c_void_p - Convert multiprocessing.Array to ctypes.c_void_p in Python 如何将二维数组作为 multiprocessing.Array 传递给 multiprocessing.Pool? - How to pass 2d array as multiprocessing.Array to multiprocessing.Pool? multiprocessing.Array(python):预期为float而不是numpy.ndarray实例 - multiprocessing.Array (python): float expected instead of numpy.ndarray instance
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM