简体   繁体   English

无法释放 numpy 阵列

[英]Cannot free a numpy array

I am trying to remove a memory bottleneck in my program.我正在尝试消除程序中的 memory 瓶颈。 Here is the interesting part:这是有趣的部分:

print_mem_info()
print("creating array")
arr = np.empty(vol_to_write.get_shape(), dtype=np.float16)
for v_tmp, a_tmp in zip(v_list, a_list):
    s = to_basis(v_tmp, vol_to_write).get_slices()
    arr[s[0][0]:s[0][1],s[1][0]:s[1][1],s[2][0]:s[2][1]] = copy.deepcopy(a_tmp)
print_mem_info()
print("deleting array")
del arr
print_mem_info()

Here is the output:这是 output:

Used RAM:  4217.71875 MB
creating array
Used RAM:  4229.68359375 MB
deleting array
Used RAM:  4229.2890625 MB

For print_mem_info I am just using the psutil library:对于 print_mem_info 我只是使用psutil库:

def print_mem_info():
    mem = psutil.virtual_memory()
    swap = psutil.swap_memory()
    used_ram = (mem.total - mem.available) /1024 /1024
    used_swap = swap.used /1024 /1024 
    print("Used RAM: ", used_ram, "MB")
    # print("Used swap: ", used_swap, "MB")

I am just creating a numpy array, filling it and then I want to delete it (in the program I am supposed to delete it later but for debugging purpose I am putting the del here).我只是创建一个 numpy 数组,填充它然后我想删除它(在程序中我应该稍后删除它,但出于调试目的,我将 del 放在这里)。 What I cannot understand is why the del is not removing the array from RAM, as there are not any other references to this array.我无法理解的是为什么 del 没有从 RAM 中删除数组,因为没有任何其他对该数组的引用。 I tried with gc.collect() and it did nothing.我尝试使用 gc.collect() 但它什么也没做。

I read a lot of other posts from stackoverflow but I could not figure it out.我从stackoverflow阅读了很多其他帖子,但我无法弄清楚。 I know that gc.collect() is not supposed to be used and I read somewhere that using del is not recommended but I am manipulating very big numpy arrays so I cannot just let them in RAM.我知道不应该使用 gc.collect() 并且我在某处读到不推荐使用 del 但我正在操作非常大的 numpy arrays 所以我不能只让它们进入 RAM。


[edit]: [编辑]:

I tried creating a minimal example here:我尝试在这里创建一个最小的示例:

import numpy as np
import psutil, os

def print_mem_info():
    process = psutil.Process(os.getpid())
    print(process.memory_info().vms // 1024 // 1024)

if __name__ == "__main__":
    print("program starts")
    print_mem_info()

    print("creating samples...")
    a_list = list()
    for i in range(4):
        a_list.append(np.random.rand(100,100,100))
    print_mem_info()

    print("creating array...")
    arr = np.empty((400,100,100))
    print_mem_info()

    print("filling the array...")
    for i, a_tmp in enumerate(a_list):
        arr[i*100:(i+1)*100,:,:] = a_tmp
        del a_tmp
    print_mem_info()

    print("deleting the array...")
    del arr
    print_mem_info()

You are measuring the memory on system level, not on process level.您正在系统级别而不是进程级别测量 memory。 You don't know what all other processes on your machine are doing.你不知道你机器上的所有其他进程在做什么。

Be careful with the example code for measuring memory of a process .请注意用于测量进程 memory的示例代码。 Many examples there are mixing virtual memory and physical memory.许多示例混合了虚拟 memory 和物理 memory。

RSS (linux term) and Working Set (Windows term) are not good for discussing your problem, because they only consider that part of memory which is currently in physical RAM. RSS(linux 术语)和 Working Set(Windows 术语)不适合讨论您的问题,因为它们只考虑 memory 当前位于物理 RAM 中的那部分。 Since that heavily depends on how much physical RAM you have, this will vary between machines and is absolutely not comparable.由于这在很大程度上取决于您拥有多少物理 RAM,因此这在机器之间会有所不同,并且绝对不可比较。

VMS (linux term) or Private Bytes (Windows term) are much more reliable, since they also consider memory that is used, but swapped to disk if you don't have enough physical RAM. VMS(linux 术语)或 Private Bytes(Windows 术语)更可靠,因为它们还考虑使用 memory,但如果您没有足够的物理 RAM,则交换到磁盘。

The following code should help you get things started:以下代码应该可以帮助您开始:

import numpy as np
import psutil
import os

def print_mem_info():
    process = psutil.Process(os.getpid())
    print(process.memory_info().vms // 1024 // 1024)

print_mem_info()
arr = np.empty((100000,100000))
print_mem_info()
del arr
print_mem_info()

On my machine, it prints在我的机器上,它打印

261
76705
262

The 76 GB sound plausible for 100.000 * 100.000 items in an array à 8 bytes.对于 8 个字节的数组中的 100.000 * 100.000 个项目,76 GB 听起来是合理的。

With RSS, the effect is not visible:使用RSS,效果是不可见的:

47
47
47

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM