简体   繁体   English

缓慢的SLAB / SLUB内存分配

[英]slow SLAB/SLUB memory allocation

I have recently encountered a very strange issue that might be due to the kernel memory allocator. 我最近遇到了一个非常奇怪的问题,可能是由于内核内存分配器造成的。 At first, I suspected some type of memory bug in my C++ code but the exact behavior I am seeing leads me to believe that perhaps it is not due to a bug in the code. 起初,我怀疑我的C ++代码中存在某种类型的内存错误,但我看到的确切行为让我相信它可能不是由于代码中的错误引起的。 It's quite strange, but here's my best description of the problem. 这很奇怪,但这是我对问题的最佳描述。

I have an application that writes and overwrites files in the /dev/shm area of my machine. 我有一个应用程序写入和覆盖我的机器的/ dev / shm区域中的文件。 At the beginning of the program, it declares file pointers for all of the files it is going to write and continuously overwrite. 在程序开始时,它为它要写入的所有文件声明文件指针并连续覆盖。 These pointers all created at the start of the program. 这些指针都是在程序开始时创建的。

When I run the code, I notice the following. 当我运行代码时,我注意到以下内容。 First memory usage jumps up to 4.3% of my system total (looking under top). 第一次内存使用率上升到我系统总数的4.3%(查看顶部)。 This happens right when I launch the executable. 当我启动可执行文件时会发生这种情况。 Then, the CPU usage hovers around 40-50% before the code even starts doing anything. 然后,在代码开始执行任何操作之前,CPU使用率会徘徊在40-50%左右。 After about 2-3 minutes of this, the memory usage then goes to 5.0% and there are no further increases. 大约2-3分钟后,内存使用量将达到5.0%,并且没有进一步增加。 At the time this happens, the CPU usage falls to 5-15% which is the range the program usually runs at (due to the rate that data is being passed to it). 在发生这种情况时,CPU使用率降至5-15%,这是程序通常运行的范围(由于数据传递给它的速率)。

Something is happening behind the scenes during my program's startup with the memory but I can't understand what it is, it feels like it shouldn't take 2-3 minutes to allocate 5% of system memory (1.2GB) on a modern x86_64 server. 在我的程序启动期间,在内存中发生了一些事情,但我无法理解它是什么,感觉它不应该花2-3分钟在现代x86_64上分配5%的系统内存(1.2GB)服务器。 Note that after this strange startup, the program usually runs without issue. 请注意,在这个奇怪的启动之后,程序通常运行没有问题。

However, today, I had to increase the number of files the program is writing to in /dev/shm and accordingly, the number of pointers as well. 但是,今天,我不得不增加程序在/ dev / shm中写入的文件数量,并相应地增加指针数量。 And here is where the trouble is, during the startup procedure, the CPU usage suddenly jumps to 100% and stays there. 这就是问题所在,在启动过程中,CPU使用率突然上升到100%并保持不变。 This is a huge problem because it leads to a massive slowdown of my application, below acceptable levels. 这是一个很大的问题,因为它导致我的应用程序大幅减速,低于可接受的水平。 The only difference between this and the working executable is the number of files I am having it write. 它和工作可执行文件之间的唯一区别是我写的文件数。 To give specifics, I increased the number of files from 1345 to 1350. In fact, just one over 1346 is sufficient to kick off this 100% cpu issue. 为了详细说明,我将文件数量从1345增加到1350.实际上,只有超过1346的文件足以启动这个100%的cpu问题。

I'm really at a loss about what I am dealing with here. 对于我在这里处理的事情,我真的很茫然。 I'm suspecting perhaps something with SLAB/SLUB allocator (my system is Centos 5.8 with 2.6.35 kernel). 我怀疑可能是SLAB / SLUB分配器(我的系统是带有2.6.35内核的Centos 5.8)。 Any ideas or hints about how to resolve this will be much appreciated. 任何关于如何解决这个问题的想法或提示都将非常感激。

I think it's unlikely to be a problem with SLUB. 我认为这不太可能是SLUB的问题。 /dev/shm is implemented via tmpfs (on modern systems), which uses the page cache, not SLUB. /dev/shm是通过tmpfs (在现代系统上)实现的,它使用页面缓存而不是SLUB。

You need to work out what your program is doing when it's chewing CPU. 你需要弄清楚你的程序在咀嚼CPU时正在做什么。 You could start with strace , that will at least show you if your program is spending lots of time in the kernel, or in your code. 您可以从strace开始,至少会告诉您程序是在内核中还是在代码中花费了大量时间。 From there you should learn to use perf . 从那里你应该学习使用perf

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM