简体   繁体   English

np.ones 中的 NumPy 内存泄漏?

[英]NumPy memory leak in np.ones?

One of my students showed my the following test case that shows an apparent memory leak in NumPy.我的一个学生向我展示了以下测试用例,该用例显示了 NumPy 中的明显内存泄漏。 I'm wondering if the memory profiler is correct here, or what's going on.我想知道这里的内存分析器是否正确,或者发生了什么。 Here's the test case:这是测试用例:

from memory_profiler import profile
import numpy as np
import gc

@profile
def test():
    arr = np.ones((10000, 6912))
    for i in range(2000):
        arr[0:75,:] = np.ones((75, 6912))
    del arr
    gc.collect()
    pass

test()

This produces the following output:这会产生以下输出:

Filename: test.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     5     32.9 MiB     32.9 MiB           1   @profile
     6                                         def test():
     7    560.3 MiB    527.4 MiB           1       arr = np.ones((10000, 6912))
     8    564.2 MiB      0.0 MiB        2001       for i in range(2000):
     9    564.2 MiB      3.9 MiB        2000           arr[0:75,:] = np.ones((75, 6912))
    10     37.0 MiB   -527.3 MiB           1       del arr
    11     37.0 MiB     -0.0 MiB           1       gc.collect()
    12     37.0 MiB      0.0 MiB           1       pass

It looks like the line with np.ones((75, 6912)) is slowly leaking memory (about 4MB here).看起来np.ones((75, 6912))正在缓慢泄漏内存(此处约为 4MB)。 If we replace this expression with just 1 , then the apparent leak disappears.如果我们只用1替换这个表达式,那么明显的泄漏就会消失。

I've tested this on Python 3.8.10 and 3.9.5 with Numpy versions 1.21.3 (latest at time of writing) and 1.20.3 and memory_profiler version 0.58.0 (latest at time of writing).我已经在 Python 3.8.10 和 3.9.5 上使用 Numpy 版本 1.21.3(撰写本文时的最新版本)和 1.20.3 和 memory_profiler 版本 0.58.0(撰写本文时的最新版本)对此进行了测试。 My operating system is Ubuntu Linux 20.04 LTS;我的操作系统是 Ubuntu Linux 20.04 LTS; my student demonstrated this on macOS (not sure which version).我的学生在 macOS 上演示了这个(不确定是哪个版本)。

What's going on?这是怎么回事?

The short answer, but not yet one that adds anything new to the conversation is that @hpaulj is right that there is no significant leak of anywhere near 4.1 MiB per call to test() and that what is happening is that not all memory that gets allocated gets returned to the OS.简短的回答,但还没有为对话添加任何新内容的答案是@hpaulj 是正确的,每次调用 test() 都没有接近 4.1 MiB 的重大泄漏,并且正在发生的事情是并非所有的内存都得到了分配的返回给操作系统。 The reason for this is that both the python arena-based allocator and libc malloc request memory from the OS in ranges, then carve this up into smaller regions to satisfy allocation requests.这样做的原因是基于 python arena 的分配器和 libc malloc 都在范围内从操作系统请求内存,然后将其划分为更小的区域以满足分配请求。 The larger regions typically cannot be freed if at least part of the given region is in use.如果给定区域的至少一部分正在使用中,则通常无法释放较大的区域。 For example, a python arena cannot be freed if any allocations from that arena have not yet been freed.例如,如果尚未释放来自该领域的任何分配,则无法释放 python 领域。

You can make some tiny modifications to your program to see that test() is not leaking 4.1 MiB per call.您可以对程序进行一些微小的修改,以确保 test() 不会在每次调用时泄漏 4.1 MiB。 For example suppose you change the last line to 2 lines:例如,假设您将最后一行更改为 2 行:

while True:
   test()

If you then run the program and check the virtual address space used by that program (for example, using top or ps) you will see that the virtual address space used by the program stops increasing almost immediately after the first run of test().如果您随后运行该程序并检查该程序使用的虚拟地址空间(例如,使用 top 或 ps),您将看到该程序使用的虚拟地址空间在第一次运行 test() 后几乎立即停止增加。

Even using the metrics provided by memory_profiler you can see this, by changing your original program so that it just calls test() twice, as in:即使使用 memory_profiler 提供的指标,您也可以通过更改原始程序使其仅调用 test() 两次来看到这一点,如下所示:

test()
test()

If you then run your program, you will see that the reported growth occurs only during the first call:如果您随后运行您的程序,您将看到报告的增长仅发生在第一次调用期间:

tim@tim-OptiPlex-3020:~$ python3 so3.py Filename: so3.py tim@tim-OptiPlex-3020:~$ python3 so3.py 文件名:so3.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     5     32.9 MiB     32.9 MiB           1   @profile
     6                                         def test():
     7    560.0 MiB    527.1 MiB           1       arr = np.ones((10000, 6912))
     8    564.1 MiB      0.0 MiB        2001       for i in range(2000):
     9    564.1 MiB      4.1 MiB        2000           arr[0:75,:] = np.ones((75, 6912))
    10     36.9 MiB   -527.3 MiB           1       del arr
    11     36.8 MiB     -0.0 MiB           1       gc.collect()
    12     36.8 MiB      0.0 MiB           1       pass


Filename: so3.py

Line #    Mem usage    Increment  Occurences   Line Contents
============================================================
     5     36.8 MiB     36.8 MiB           1   @profile
     6                                         def test():
     7    564.1 MiB    527.3 MiB           1       arr = np.ones((10000, 6912))
     8    564.1 MiB      0.0 MiB        2001       for i in range(2000):
     9    564.1 MiB      0.0 MiB        2000           arr[0:75,:] = np.ones((75, 6912))
    10     36.8 MiB   -527.3 MiB           1       del arr
    11     36.8 MiB      0.0 MiB           1       gc.collect()
    12     36.8 MiB      0.0 MiB           1       pass

So the next question you might ask why the memory grows during the first call to test() but apparently not during the second.所以下一个问题你可能会问为什么内存在第一次调用test()期间增长,但显然不是在第二次调用期间。 To answer that question we can use https://github.com/vmware/chap which is open source and can be compiled by your students on Linux.要回答这个问题,我们可以使用https://github.com/vmware/chap ,它是开源的,可以由您的学生在 Linux 上编译。

As input, chap generally just requires a core file.作为输入,chap 通常只需要一个核心文件。 In this particular case, we want at least 2 core files because we want to know which allocations made during the first call to test() but never freed.在这种特殊情况下,我们需要至少 2 个核心文件,因为我们想知道在第一次调用 test() 期间进行了哪些分配但从未释放。

To do that, we can modify the program to sleep between the calls to test, to give us time to gather the core files.为此,我们可以修改程序在测试调用之间休眠,以便我们有时间收集核心文件。 After this slight modification, the revised program looks like this:稍加修改后,修改后的程序如下所示:

from time import sleep
from memory_profiler import profile
import numpy as np
import gc

@profile
def test():
    arr = np.ones((10000, 6912))
    for i in range(2000):
        arr[0:75,:] = np.ones((75, 6912))
    del arr
    gc.collect()
    pass

print('sleep before first test()')
sleep(120)
test()
print('sleep before second test()')
sleep(120)
test()
print('sleep after second test()')
sleep(120)

With those modifications, we can run the program in the background and gather a core before the first call to test(), a core before the second call to test() and a core before the third call to test().通过这些修改,我们可以在后台运行程序并在第一次调用 test() 之前收集一个核心,在第二次调用 test() 之前收集一个核心,在第三次调用 test() 之前收集一个核心。

First, as an administrative detail, we set the coredump_filter used by the shell to 0x37, so that when we run python the process will inherit this coredump_filter value and so that when we create cores they will have information about file backed memory.首先,作为管理细节,我们将 shell 使用的 coredump_filter 设置为 0x37,这样当我们运行 python 时,进程将继承这个 coredump_filter 值,这样当我们创建内核时,它们将拥有有关文件支持内存的信息。

tim@tim-OptiPlex-3020:~$ cat /proc/self/coredump_filter
00000033
tim@tim-OptiPlex-3020:~$ echo 0x37 >/proc/self/coredump_filter
tim@tim-OptiPlex-3020:~$ cat /proc/self/coredump_filter
00000037

Now we are ready to start the program in the background and gather the first core while the program does the first sleep().现在我们准备在后台启动程序并在程序执行第一个 sleep() 时收集第一个核心。

tim@tim-OptiPlex-3020:~$ python3 so4.py &
[2] 125315
tim@tim-OptiPlex-3020:~$ sleep before first test()
sudo gcore -o beforeFirst 125315
[sudo] password for tim: 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f25bbbb012b in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/125315/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ad7d8000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ada0c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25adc23000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ade39000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae051000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae2a7000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae522000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae74b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae9d2000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aec50000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aef3c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af145000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af41b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b708b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b7494000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9358000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b99e3000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9cc4000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9eca000.
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile beforeFirst.125315
[Inferior 1 (process 125315) detached]

Then we wait until the first call to test() has finished and gather another core while the program does the second sleep() .然后我们等到第一次调用test()完成并在程序执行第二次sleep() 时收集另一个核心。

sleep before second test()
sudo gcore -o beforeSecond 125315
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f25bbbb012b in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/125315/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ad7d8000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ada0c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25adc23000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ade39000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae051000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae2a7000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae522000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae74b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae9d2000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aec50000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aef3c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af145000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af41b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b708b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b7494000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9358000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b99e3000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9cc4000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9eca000.
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile beforeSecond.125315
[Inferior 1 (process 125315) detached]

Then we wait for the second call to test() to complete and gather a third core while the program does the third sleep() .然后我们等待第二次调用 test() 完成并在程序执行第三次sleep()时收集第三个核心。

sleep after second test()
sudo gcore -o afterSecond 125315
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
0x00007f25bbbb012b in select () from /lib/x86_64-linux-gnu/libc.so.6
warning: target file /proc/125315/cmdline contained unexpected null characters
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ad7d8000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ada0c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25adc23000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ade39000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae051000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae2a7000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae522000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae74b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25ae9d2000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aec50000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25aef3c000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af145000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25af41b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b708b000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b7494000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9358000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b99e3000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9cc4000.
warning: Memory read failed for corefile section, 1048576 bytes at 0x7f25b9eca000.
warning: Memory read failed for corefile section, 4096 bytes at 0xffffffffff600000.
Saved corefile afterSecond.125315
[Inferior 1 (process 125315) detached]

Now we are ready to analyze the cores using chap, which takes a core file as input.现在我们准备使用 chap 分析内核,它以内核文件作为输入。 The memory that is most interesting with respect to the size of a process is the range that is writable and we can get some details about this by using chap on the core from before the first call to test() .就进程的大小而言,最有趣的内存是可写范围,我们可以通过在第一次调用test()之前在核心上使用 chap 来获取有关此的一些详细信息。

tim@tim-OptiPlex-3020:~$ chap beforeFirst.125315
chap> summarize writable
12 ranges take 0x603e000 bytes for use: unknown
3 ranges take 0x1800000 bytes for use: cached pthread stack
41 ranges take 0xa40000 bytes for use: python arena
1 ranges take 0x51c000 bytes for use: libc malloc main arena pages
47 ranges take 0x1e5000 bytes for use: used by module
1 ranges take 0x91000 bytes for use: libc malloc mmapped allocation
1 ranges take 0x21000 bytes for use: main stack
106 writable ranges use 0x8a31000 (144,904,192) bytes.

In the above notice the line with "python arena".在上面注意带有“python arena”的行。 That one is associated with python's arena-based allocator.那一个与 python 的基于 arena 的分配器相关联。 Also notice the lines with "libc malloc main arena pages" and with "libc malloc mmapped allocation".还要注意带有“libc malloc main arena pages”和“libc malloc mmapped allocation”的行。 Those are, not surprisingly, associated with libc malloc, which is used both by native libraries and in some cases by python, such as when an allocation exceeds a certain size.毫不奇怪,这些都与 libc malloc 相关联,它既被本地库使用,在某些情况下也被 Python 使用,例如当分配超过特定大小时。

As I mentioned earlier, these large ranges get used to allocate small allocations.正如我之前提到的,这些大范围用于分配小分配。 We can get a count for the used allocations (which are ones that have not been freed) and for the free allocations (which occupy space that has not yet been given back to the OS and could be used for future allocations).我们可以获得已使用的分配(尚未释放的分配)和空闲分配(占用尚未归还给操作系统并可用于未来分配的空间)的计数。

chap> count used
114423 allocations use 0xf4df58 (16,047,960) bytes.
chap> count free
730 allocations use 0x5fb30 (391,984) bytes.

Now we can compare by using the same 3 commands in chap on the second core.现在我们可以在第二个内核的 chap 中使用相同的 3 个命令进行比较。 What we see is that the growth all appeared in the ranges summarized as used by "libc malloc main arena pages", which grew from 0x51c000 bytes to 0x926000, or slightly more than 4 MiB.我们看到的是,增长都出现在“libc malloc main arena pages”总结的范围内,从0x51c000字节增长到0x926000,或略大于4 MiB。

tim@tim-OptiPlex-3020:~$ chap beforeSecond.125315
chap> summarize writable
12 ranges take 0x603e000 bytes for use: unknown
3 ranges take 0x1800000 bytes for use: cached pthread stack
41 ranges take 0xa40000 bytes for use: python arena
1 ranges take 0x926000 bytes for use: libc malloc main arena pages
47 ranges take 0x1e5000 bytes for use: used by module
1 ranges take 0x91000 bytes for use: libc malloc mmapped allocation
1 ranges take 0x21000 bytes for use: main stack
106 writable ranges use 0x8e3b000 (149,139,456) bytes.

If we drill down further we can see that the used allocations grew by a bit less than 100,000 bytes and the free allocations grew by about 4 MiB.如果我们进一步深入,我们可以看到使用的分配增长了不到 100,000 字节,而免费分配增长了大约 4 MiB。

chap> count used
114686 allocations use 0xf64ac8 (16,141,000) bytes.
chap> count free
1312 allocations use 0x4522e8 (4,530,920) bytes.
chap> 

This basically proves the theory by @hpaulj, with the exception that there was a little bit of growth in used allocations during that first run of test().这基本上证明了@hpaulj 的理论,除了在第一次运行 test() 期间使用的分配有一点增长。 It would probably be interesting to understand that, but for now I'll just note that the bulk of the growth was explained by free allocations.理解这一点可能会很有趣,但现在我只想指出,大部分增长是由免费分配解释的。 This is not bad because those areas of memory are available for reuse.这还不错,因为这些内存区域可以重用。

So now we check what happened during the second run of test() and can see that the process didn't get larger but there is one more used allocation and very slightly less memory used for free allocations.因此,现在我们检查第二次运行test()期间发生的情况,可以看到进程没有变大,但使用了更多的分配,而用于免费分配的内存却少了一点。

tim@tim-OptiPlex-3020:~$ chap afterSecond.125315
chap> summarize writable
12 ranges take 0x603e000 bytes for use: unknown
3 ranges take 0x1800000 bytes for use: cached pthread stack
41 ranges take 0xa40000 bytes for use: python arena
1 ranges take 0x926000 bytes for use: libc malloc main arena pages
47 ranges take 0x1e5000 bytes for use: used by module
1 ranges take 0x91000 bytes for use: libc malloc mmapped allocation
1 ranges take 0x21000 bytes for use: main stack
106 writable ranges use 0x8e3b000 (149,139,456) bytes.
chap> count used
114687 allocations use 0xf64ca8 (16,141,480) bytes.
chap> count free
1249 allocations use 0x452148 (4,530,504) bytes.
chap> 

So the second run of test() used allocations that were left free after the first run, then freed most of those allocations again when they were not needed.因此第二次运行的test()使用了第一次运行后空闲的分配,然后在不需要时再次释放大部分分配。 This is working as expected.这按预期工作。

One might still ask for an explanation of the extra used allocations after the first call to test() and of the one extra used allocation after the second call to test() .在第一次调用test()和第二次调用test()之后的一个额外使用的分配之后,可能仍然要求解释额外使用的分配。 It is possible to do that using the existing core files but I will stop here because that takes more time and I have shown the following:可以使用现有的核心文件来做到这一点,但我会在这里停下来,因为这需要更多的时间,我已经展示了以下内容:

  1. The memory profiler is basically correct that the program grew by roughly 4 MiB during the first run.内存分析器基本上是正确的,程序在第一次运行期间增长了大约 4 MiB。
  2. This is at least mostly not a problem because most of the additional memory is free within the process and available for future allocations.这至少不是问题,因为大部分额外的内存在进程中是空闲的,可用于未来的分配。
  3. There may or may not be some very small growth after the first call to test() but if so it must be extremely slow because one can run test() in a loop without any obvious signs of growth.在第一次调用 test() 之后可能会或可能不会有一些非常小的增长,但如果是这样,它肯定会非常慢,因为可以在循环中运行test()而没有任何明显的增长迹象。 If someone decides to run test() in a loop for a really long time and observes that the process does in fact eventually get larger, feel free to ask another question about why and I will continue the analysis as an answer to that question.如果有人决定在循环中运行test()很长时间并观察到该过程实际上最终会变大,请随意提出另一个关于原因的问题,我将继续分析作为该问题的答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM