I have a class that has a cache implemented as a dict
for numpy arrays, which can occupy GBs of data.
class WorkOperations(object):
def __init__(self):
self.data_cache: Dict[str, Dict[str, Tuple[np.ndarray, np.ndarray]]] = {}
def get_data(key):
if key not in data_cache:
add_data(key)
return self.data_cache[key]
def add_data(key)
result = run_heavy_calculation(key)
self.data_cache[key] = result
I am testing the code with this function -
import gc
def perform_operations()
work_operations = WorkOperations()
# input_keys gives a list of keys to process
for keys in input_keys():
data = work_operations.get_data(key)
do_some_operation(data)
del work_operations
perform_operations()
gc.collect()
The result of run_heavy_calculation
is heavy in memory and soon data_cache
grows and occupies memory in GBs (which is expected).
But memory does not get released even after perform_operations()
is done. I tried adding del work_operations
and invoking gc.collect()
but that did not help either. I checked memory of the process after several hours, but the memory was still not freed up.
If I don't use caching ( data_cache
) at all (at the cost of latency), memory never goes high.
I am wondering what is it that is taking memory. I tried running tracemalloc
, but it just showed lines occupying memory in KBs. I also took a memory dump with gdb
by looking at memory address from process pmap
and /proc/<pid>/smaps
, but that is really long and even with hexeditor I couldn't figure out much.
I am measuring memory used by the process using top
command and looking at RES
. I also tried outputting memory in the end from within python process as well with -
import psutils
import gc
import logging
GIGABYTE = 1024.0 * 1024.0 * 1024.0
perform_operations()
gc.collect()
memory_full_info = psutil.Process().memory_full_info()
logging.info(f"process memory after running the process {memory_full_info.uss / GIGABYTE}")
Could not reproduce on Ubuntu with this:
import itertools
import time
import os
from typing import Dict, Tuple
import numpy as np
import psutil # installed with pip
process = psutil.Process(os.getpid())
SIZE = 10**7
def run_heavy_calculation(key):
array1 = np.zeros(SIZE)
array2 = np.zeros(SIZE)
# Linux use "virtual memory", which means that the memory required for the arrays was allocated, but will not
# be actually deducted until we use them, so we write some 1 into them
# cf: https://stackoverflow.com/q/29516888/11384184
for i in range(0, SIZE, 1000):
array1[i] = 1
array2[i] = 1
return {key: (array1, array2)}
class WorkOperations(object):
def __init__(self):
self.data_cache: Dict[str, Dict[str, Tuple[np.ndarray, np.ndarray]]] = {}
def get_data(self, key):
if key not in self.data_cache:
self.add_data(key)
return self.data_cache[key]
def add_data(self, key):
result = run_heavy_calculation(key)
self.data_cache[key] = result
def perform_operations(input_keys):
work_operations = WorkOperations()
for key in input_keys():
data = work_operations.get_data(key)
time.sleep(0.2)
print(key, process.memory_info().rss / 10**9)
del work_operations
perform_operations(lambda: map(str, itertools.product("abcdefgh", "0123456789"))) # dummy keys
print("after operations", process.memory_info().rss / 10**9)
input("pause")
('a', '0') 0.113106944
('a', '1') 0.195014656
('a', '2') 0.276926464
...
('h', '7') 6.421118976
('h', '8') 6.503030784
('h', '9') 6.584942592
after operations 0.031363072
pause
It climbed up to using 6,5 Go of RAM, then returned from the function and all of it was released.
You can add a finalizer ( __del__
) to the class WorkOperations
:
def __del__(self):
print("deleted")
I see it printed between the last operation's print and the one after.
Although it does not guarantee that this will always be the case ( cf this question ), it strongly indicates that everything is working as intended: event without the del
, the functions returns (hence the scope is lost), so the reference count for work_operations
gets to 0, and then gets GC'ed.
It can be checked with sys.getrefcount
too:
print(sys.getrefcount(work_operations) - 1) # see https://stackoverflow.com/a/510417/11384184
del work_operations
which for me print 1
.
Please provide a Minimal Reproducible Example and info on your system.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.