简体   繁体   中英

multiprocessing.Array allocation in memory

I am creating a multiprocessing.Array and checking its size in memory with htop . also, I am looking at how much time it takes to access the last element of the array.

import multiprocessing as mp
import numpy as np
import time
X = 1 # 1 GB
mp_array = mp.Array('B', X*1024*1024*1024, lock = True)
np_array = np.frombuffer(mp_array.get_obj(), dtype = 'B')

t1 = time.time()
for i1 in range(10**7):
    a = np_array[-1]
print(time.time() - t1)

when I create a 'small' array, I see in htop how the Memory increases by a GB, the same if the size is 3 or 6 GB, so all the array goes to memory in use (the green one). I have 32 GB on my pc, so I tried creating a 20GB array, but when I do it, I see that the green memory is the same value as before creating the array, and all of it goes to the yellow one (cache, for what I have read). I even tried creating an array bigger than my total RAM, and it worked. so I don't really know what is happening in the background (is not using the swap). The times are basically the same in each scenario, taking around 1.5s for the loop

Now, this is just testing, for the program I am working on, I create six independent shared memory arrays, of X GB each in a machine with 8 GB. (Just for context: each array is a buffer for a camera stream)

I tried with X = 0.5, so should use 3 GB total, and it works as intended, but when I go with X = 1, a fraction of it goes to cache. I guess the first few arrays are allocated in the 'active' memory and the rest into the cache? In my crude testing, I saw no time difference when accessing any type of memory. but I really don't understand what is happening here nor why I can create an array larger than my total RAM.

Any insight you can give me?

sharedctypes allocates memory from chunks referred to as "Arena"s which are memory-mapped files. A best effort is made to create said files from non-storage backed directories (part of your fs is only ever in ram...) for best performance. The available space is checked before it makes the file however, so if there isn't space in the in-memory temp folder, the temp file may be made elsewhere that does reside on disk. The buffer is then eventually taken as a memoryview of the mmap:

multiprocessing/heap.py

else: #if sys.platform != 'win32':

    class Arena(object):
        """
        A shared memory area backed by a temporary file (POSIX).
        """

        if sys.platform == 'linux':
            _dir_candidates = ['/dev/shm']
        else:
            _dir_candidates = []

        def __init__(self, size, fd=-1):
            self.size = size
            self.fd = fd
            if fd == -1:
                # Arena is created anew (if fd != -1, it means we're coming
                # from rebuild_arena() below)
                self.fd, name = tempfile.mkstemp(
                     prefix='pym-%d-'%os.getpid(),
                     dir=self._choose_dir(size))
                os.unlink(name)
                util.Finalize(self, os.close, (self.fd,))
                os.ftruncate(self.fd, size)
            self.buffer = mmap.mmap(self.fd, self.size)

        def _choose_dir(self, size):
            # Choose a non-storage backed directory if possible,
            # to improve performance
            for d in self._dir_candidates:
                st = os.statvfs(d)
                if st.f_bavail * st.f_frsize >= size:  # enough free space?
                    return d
            return util.get_temp_dir()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM