简体   繁体   English

又一个内存泄漏问题(程序终止时内存仍然消失) - SLES上的C程序

[英]Yet another Memory Leak Issue (memory is still gone when program terminates)- C program on SLES

I run my C program on Suse Linux Enterprise that compresses several thousand large files (between 10MB and 100MB in size), and the program gets slower and slower as the program runs (it's running multi-threaded with 32 threads on a Intel Sandy Bridge board). 我在Suse Linux Enterprise上运行我的C程序,它压缩了数千个大文件(大小在10MB到100MB之间),程序运行时程序越来越慢(它在Intel Sandy Bridge板上运行32个线程的多线程) )。 When the program completes, and it's run again, it's still very slow. 当程序完成并再次运行时,它仍然非常慢。

When I watch the program running, I see that the memory is being depleted while the program runs, which you would think is just a classic memory leak problem. 当我观看程序运行时,我发现在程序运行时内存正在耗尽,您认为这只是一个典型的内存泄漏问题。 But, with a normal malloc()/free() mismatch, I would expect all the memory to return when the program terminates. 但是,由于正常的malloc()/ free()不匹配,我希望在程序终止时返回所有内存。 But, most of the memory doesn't get reclaimed when the program completes. 但是,程序完成后,大部分内存都无法回收。 The free or top command shows Mem: 63996M total, 63724M used, 272M free when the program is slowed down to a halt, but, after the termination, the free memory only grows back to about 3660M. free或top命令显示Mem:63996M total,63724M used,当程序减速到停止时272M free,但是,在终止后,空闲内存只会增长回到3660M左右。 When the program is rerun, the free memory is quickly used up. 重新运行程序时,可用内存很快就会用完。

The top program only shows that the program, while running, is using at most 4% or so of the memory. 顶级程序仅显示程序在运行时最多使用4%左右的内存。

I thought that it might be a memory fragmentation problem, but, I built a small test program that simulates all the memory allocation activity in the program (many randomized aspects were built in - size/quantity), and it always returns all the memory upon completion. 我认为这可能是一个内存碎片问题,但是,我构建了一个小型测试程序,模拟程序中的所有内存分配活动(许多随机方面都是内置的 - 大小/数量),它总是返回所有内存完成。 So, I don't think that's it. 所以,我不认为就是这样。

Questions: 问题:

  1. Can there be a malloc()/free() mismatch that will lose memory permanently, ie even after the process completes? 可能存在malloc()/ free()不匹配会永久丢失内存,即使在进程完成后也是如此?

  2. What other things in a C program (not C++) can cause permanent memory loss, ie after the program completes, and even the terminal window closes? C程序(而不是C ++)中的其他什么东西会导致永久性内存丢失,即程序完成后,甚至终端窗口关闭? Only a reboot brings the memory back. 只有重新启动才能恢复内存。 I've read other posts about files not being closed causing problems, but, I don't think I have that problem. 我已经阅读了其他关于文件未被关闭导致问题的帖子,但是,我认为我没有那个问题。

  3. Is it valid to be looking at top and free for the memory statistics, ie do they accurately describe the memory situation? 查看内存统计数据是否有效,是否有效,即它们是否准确描述了内存情况? They do seem to correspond to the slowness of the program. 它们似乎与程序的缓慢程度相符。

  4. If the program only shows a 4% memory usage, will something like valgrind find this problem? 如果程序只显示4%的内存使用量,valgrind会发现这个问题吗?

Can there be a malloc()/free() mismatch that will lose memory permanently, ie even after the process completes? 可能存在malloc()/ free()不匹配会永久丢失内存,即使在进程完成后也是如此?

No, and free , and even are harmless in this respect, and when the process terminates the OS (SUSE Linux in this case) claims all their memory back (unless it's shared with some other process that's still running). 不, free ,甚至在这方面都是无害的,并且当进程终止操作系统(在这种情况下为SUSE Linux)时会声明所有内存(除非它与其他正在运行的进程共享)。

What other things in a C program (not C++) can cause permanent memory loss, ie after the program completes, and even the terminal window closes? C程序(而不是C ++)中的其他什么东西会导致永久性内存丢失,即程序完成后,甚至终端窗口关闭? Only a reboot brings the memory back. 只有重新启动才能恢复内存。 I've read other posts about files not being closed causing problems, but, I don't think I have that problem. 我已经阅读了其他关于文件未被关闭导致问题的帖子,但是,我认为我没有那个问题。

Like malloc/free and mmap, files opened by the process are automatically closed by the OS. 与malloc / free和mmap一样,该进程打开的文件将由操作系统自动关闭。

There are a few things which cause permanent memory leaks like big pages but you would certainly know about it if you were using them. 有一些东西导致永久性内存泄漏,如大页面,但如果你使用它们,你肯定会知道它。 Apart from that, no. 除此之外,没有。


However, if you define memory loss as memory not marked ' free ' immediately, then a couple of things can happen. 但是,如果您将内存丢失定义为内存未立即标记为“ 空闲 ”,则可能会发生一些事情。

  1. Writes to disk or mmap may be cached for a while in RAM. 写入磁盘或mmap可能会在RAM中缓存一段时间。 The OS must keep the pages around until it synchs them back to disk. 操作系统必须保留页面,直到它们将它们同步回磁盘。
  2. Files READ by the process may remain in memory if the OS has nothing else to use that RAM for right now - on the reasonable assumption that it might need them soon and it's quicker to read the copy that's already in RAM. 如果操作系统现在没有其他任何东西可以使用该RAM,那么进程读取的文件可能会保留在内存中 - 在合理的假设下,它可能很快就需要它们,并且可以更快地读取已经在RAM中的副本。 Again, if the OS or another process needs some of that RAM, it can be discarded instantly. 同样,如果操作系统或其他进程需要某些RAM,则可以立即将其丢弃。

Note that as someone who paid for all my RAM, I would rather the OS used ALL of it ALL the time, if it helps in even the smallest way. 请注意,作为支付我所有RAM的人,我宁愿操作系统一直使用所有这些,如果它以最小的方式帮助。 Free RAM is wasted RAM. 免费RAM浪费了RAM。


The main problem with having little free RAM is when it is overcommitted , which is to say there are more processes (and the OS) asking for or using RAM right now than is available on the system. 拥有少量空闲RAM的主要问题是当它过度使用时 ,也就是说现在有更多的进程(和操作系统)要求或使用RAM,而不是系统上可用的。 It sounds like you are using about 4Gb of RAM in your processes, which might be a problem - (and remember the OS needs a good chunk too. But it sounds like you have plenty of RAM! Try running half the number of processes and see if it gets better. 听起来你在你的进程中使用大约4Gb的RAM,这可能是一个问题 - (并且记住操作系统也需要一个很好的块。但是听起来你有足够的RAM!尝试运行一半的进程并看到如果它变得更好

Sometimes a memory leak can cause temporary overcommitment - it's a good idea to look into that. 有时内存泄漏可能导致暂时的过度使用 - 这是一个好主意。 Try plotting the memory use of your program over time - if it rises continuously, then it may well be a leak. 尝试绘制程序的内存使用情况 - 如果它连续上升,那么很可能是泄漏。

Note that fork ing a process creates a copy that shares the memory the original allocated - until both are closed or one of them 'exec's. 请注意, fork进程会创建一个副本,该副本共享原始分配的内存 - 直到两者都关闭或其中一个'exec'。 But you aren't doing that. 但你不是那样做的。

Is it valid to be looking at top and free for the memory statistics, ie do they accurately describe the memory situation? 查看内存统计数据是否有效,是否有效,即它们是否准确描述了内存情况? They do seem to correspond to the slowness of the program. 它们似乎与程序的缓慢程度相符。


Yes, top and ps are perfectly reasonable ways to look at memory, in particular observe the RES field. 是的, topps是查看内存的完美合理方式,特别是观察RES字段。 Ignore the VIRT field for now. 暂时忽略VIRT字段。 In addition: 此外:

To see what the whole system is doing with memory, run: 要查看整个系统对内存的作用,请运行:

vmstat 10

While your program is running and for a while after. 当你的程序运行一段时间后。 Look at what happens to the ---memory--- columns. 看看---memory---列会发生什么。

In addition, after your process has finished, run 此外,在您的过程完成后,运行

cat /proc/meminfo

And post the results in your question. 并将结果发布在您的问题中。

If the program only shows a 4% memory usage, will something like valgrind find this problem? 如果程序只显示4%的内存使用量,valgrind会发现这个问题吗?

Probably, but it can be extremely slow, which might be impractical in this case. 可能,但它可能非常慢,在这种情况下可能是不切实际的。 There are plenty of other tools which can help such as electricfence and others which do not slw your program down noticeably. 还有很多其他工具可以帮助诸如电子设备和其他不会显着降低程序的工具。 I've even rolled my own in the past. 我过去甚至自己动手了。

malloc()/free() work on the heap. malloc()/ free()在堆上工作。 This memory is guaranteed to be released to the OS when the process terminates. 保证在进程终止时将此内存释放到操作系统。 It is possible to leak memory even after the allocating process terminates using certain shared memory primitives (eg System V IPC). 即使在使用某些共享存储器基元(例如,System V IPC)终止分配过程之后,也可能泄漏存储器。 However, I don't think any of this is directly relevant. 但是,我不认为这与任何直接相关。

Stepping back a bit, here's output from a lightly-loaded Linux server: 稍微退一步,这是来自轻载Linux服务器的输出:

$ uptime
 03:30:56 up 72 days,  8:42,  2 users,  load average: 0.06, 0.17, 0.27
$ free -m
             total       used       free     shared    buffers     cached
Mem:         24104      23452        652          0      15821        978
-/+ buffers/cache:       6651      17453
Swap:         3811          5       3806

Oh no, only 652 MB free! 哦不,只有652 MB免费! Right? 对? Wrong. 错误。

Whenever Linux accesses a block device (say, a hard drive), it looks for any unused memory, and stores a copy of the data there. 每当Linux访问块设备(例如,硬盘驱动器)时,它会查找任何未使用的内存,并在那里存储数据的副本。 After all, why not? 毕竟,为什么不呢? The data's already in RAM, some program clearly wanted that data, and RAM that's unused can't do anyone any good. 数据已经在RAM中,某些程序显然希望数据和未使用的RAM不能对任何人有任何好处。 If a program comes along and asks for more memory, the cached data is discarded to make room -- until then, might as well hang onto it. 如果程序出现并请求更多内存,则会丢弃缓存的数据以腾出空间 - 在此之前,还可以挂起它。

The key to this free output is not the first line, but the second. 这个free输出的关键不是第一行,而是第二行。 Yes, 23.4 GB of RAM is being used -- but 17.4 GB is available for programs that want it. 是的,正在使用23.4 GB的RAM - 但是17.4 GB可用于需要它的程序。 See Help! 帮助! Linux ate my RAM! Linux吃了我的RAM! for more. 更多。

I can't say why the program is getting slower, but having the "free memory" metric steadily drop down to nothing is entirely normal and not the cause. 我不能说为什么程序变得越来越慢,但让“自由记忆”指标稳步下降到什么都不是完全正常而不是原因。

The operating system only makes as much memory free as it absolutely needs. 操作系统只需要尽可能多的内存,因为它绝对需要。 Making memory free is wasted effort if the memory is later used normally -- it's more efficient to just directly transition the memory from one use to another than to make the memory free just to have to make it unfree later. 如果以后正常使用内存,那么释放内存就是浪费精力 - 将内存从一个用户直接转换到另一个用户比使内存空闲只是为了让以后不再使用它更有效。

The only thing the system needs free memory for is operations that require memory that can't switch used memory from one purpose to another. 系统唯一需要空闲内存的是需要内存的操作,这些内存无法将使用的内存从一个目的切换到另一个目的。 This is a very small set of unusual operations such as servicing network interrupts. 这是一组非常小的异常操作,例如服务网络中断。

If you type this command sysctl vm.min_free_kbytes , the system will tell you the number of KB it needs free. 如果键入此命令sysctl vm.min_free_kbytes ,系统将告诉您它需要的KB数量。 It's likely less than 100MB. 它可能不到100MB。 So having any amount more than that free is perfectly fine. 因此,任何超过免费的金额都是完美的。

If you want more of your memory free, remove it from the computer. 如果您想要更多的内存空闲,请将其从计算机中删除。 Otherwise, the operating system assumes that there is zero cost to using it, and thus zero benefit to making it free. 否则,操作系统假定使用它的成本为零,因此使其免费是零利益。

For example, consider the data you wrote to disk. 例如,考虑您写入磁盘的数据。 The operating system could make the memory that was holding that data free. 操作系统可以使保存该数据的内存空闲。 But that's a double loss. 但这是双重损失。 If the data you wrote to disk is later read, it will have to read it from disk rather than just grabbing it from memory. 如果稍后读取写入磁盘的数据,则必须从磁盘读取数据而不是从内存中读取数据。 And if that memory is later needed for some other purpose, it will just have to undo all the work it went through making it free. 如果稍后需要该内存用于某些其他目的,则只需撤消其通过使其免费所做的所有工作。 Yuck. 呸。 So if the system doesn't absolutely need free memory, it won't make it free. 因此,如果系统不是绝对需要空闲内存,它将不会释放它。

My guess would be the problem is not in your program, but in the operating system. 我的猜测是问题不在你的程序中,而是在操作系统中。 The OS keeps a cache of recently used files in memory on the assumption that you are going to access them again. 操作系统假设您将再次访问它们,在内存中保留最近使用的文件的缓存。 It does not know with certainty what files are going to be needed, so it can end up deciding to keep the wrong ones at the expense of the ones you wish it was keeping. 它不确定知道将需要哪些文件,因此最终可能会以牺牲您希望保留的文件为代价来决定保留错误的文件。

It may be keeping the output files of the first run cached when you do your second run, which prevents it from effectively using the cache on the second run. 它可能会在您第二次运行时保持第一次运行的输出文件被缓存,从而阻止它在第二次运行时有效地使用缓存。 You can test this theory by deleting all files from the first run (which should free them from cache) and seeing if that makes the second run go faster. 您可以通过删除第一次运行中的所有文件(应该将它们从缓存中释放)来测试此理论,并查看是否会使第二次运行更快。

If that doesn't work, try deleting all the input files for the first run as well. 如果这不起作用,请尝试删除第一次运行的所有输入文件。

Answers 答案

  1. Yes there is no requirement in C or C++ to release memory that is not freed back to the OS 是的,C或C ++中没有要求释放未释放回操作系统的内存
  2. Do you have memory mapped files, open file handles for deleted files etc. Linux will not delete a file until all references to is a deallocated. 您是否有内存映射文件,打开已删除文件的文件句柄等。在取消分配所有引用之前,Linux不会删除文件。 Also linux will cache the file in memory in case it needs to be read again - file cache memory usage can be ignored as the OS will deal with it 此外,linux会将文件缓存在内存中,以防需要再次读取 - 文件缓存内存使用量可以忽略,因为操作系统会处理它
  3. No 没有
  4. Maybe valgrind will highlight cases where memory is not 也许valgrind将突出显示内存不存在的情况

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM