简体繁体 English

内存分配中的锁争用 - 多线程与多进程

[英]lock contention in memory allocation - multi-threaded vs. multi-process

原文 2016-09-15 11:11:35 5 1 linux/ multithreading/ memory-management/ multiprocessing/ contention

We have developed a big C++ application that is running satisfactorily at several sites on big Linux and Solaris boxes (up to 160 CPU cores or even more).我们开发了一个大型 C++ 应用程序，它在大型 Linux 和 Solaris 机器（多达 160 个 CPU 内核或更多）上的多个站点上运行令人满意。 It's a heavily multi-threaded (1000+ threads), single-process architecture, consuming huge amounts of memory (200 GB+).这是一个高度多线程（1000 多个线程）的单进程架构，消耗大量内存（200 GB+）。 We are LD_PRELOADing Google Perftool's tcmalloc (or libumem/mtmalloc on Solaris) to avoid memory allocation performance bottlenecks with generally good results.我们正在 LD_PRELOADing Google Perftool 的 tcmalloc（或 Solaris 上的 libumem/mtmalloc）以避免内存分配性能瓶颈，并获得普遍良好的结果。 However, we are starting to see adverse effects of lock contention during memory allocation/deallocation on some bigger installations, especially after the process has been running for a while (which hints to aging/fragmentation effects of the allocator).然而，我们开始看到在一些较大安装的内存分配/释放期间锁争用的不利影响，特别是在进程运行一段时间之后（这暗示了分配器的老化/碎片效应）。

We are considering changing to a multi-process/shared memory architecture (the heavy allocation/deallocation will not happen in shared memory, rather on the regular heap).我们正在考虑更改为多进程/共享内存架构（大量分配/释放不会发生在共享内存中，而是发生在常规堆上）。

So, finally, here's our question: can we assume that the virtual memory manager of modern Linux kernels is capable of efficiently handing out memory to hundreds of concurent processes?所以，最后，我们的问题是：我们可以假设现代 Linux 内核的虚拟内存管理器能够有效地将内存分配给数百个并发进程吗？ Or do we have to expect running into the same kind of problems with memory allocation contention that we see in our single-process/multi-threading environment?或者我们是否必须期望遇到与我们在单进程/多线程环境中看到的相同类型的内存分配争用问题？ I tend to hope for a better overall system performance, as we would no longer be limited to a single address space, and that having several independent address spaces would require less locking on the part of the virtual memory manager.我倾向于希望有更好的整体系统性能，因为我们将不再局限于单个地址空间，并且拥有多个独立的地址空间将需要较少的虚拟内存管理器锁定。 Anyone have any actual experience or performance data comparing multi-threaded vs. multi-process memory allocation?任何人都有比较多线程与多进程内存分配的实际经验或性能数据？

1 个解决方案

I tend to hope for a better overall system performance, as we would no longer be limited to a single address space, and that having several independent address spaces would require less locking on the part of the virtual memory manager.我倾向于希望有更好的整体系统性能，因为我们将不再局限于单个地址空间，并且拥有多个独立的地址空间将需要较少的虚拟内存管理器锁定。

There is no reason to expect this.没有理由期待这一点。 Unless your code is so badly designed that it constantly goes back to the OS to allocate memory, it won't make any significant difference.除非您的代码设计得很糟糕，以至于它不断返回操作系统分配内存，否则不会产生任何显着差异。 Your application should only need to go back to the OS's virtual memory manager when it needs more virtual memory, which should not occur significantly once the process reaches its stable size.您的应用程序应该只在需要更多虚拟内存时才需要返回到操作系统的虚拟内存管理器，一旦进程达到其稳定大小，这种情况就不应该发生。

If you are constantly allocating and freeing all the way back to the OS, you should stop doing that.如果您一直在不断地分配和释放回操作系统，您应该停止这样做。 If you're not, then you can keep multiple pools of already-allocated memory that can be used by multiple threads without contention.如果不是，那么您可以保留多个已分配内存池，供多个线程使用而不会发生争用。 And, as a benefit, your context switches will be cheaper because TLB's don't have to be flushed.而且，作为一个好处，您的上下文切换将更便宜，因为不必刷新 TLB。

Only if you can't reduce the frequency of address space changes (for example, if you must map and unmap files) or if you have to change other shared resources (like file descriptors) should you look at multiprocess options.只有当您无法减少地址空间更改的频率时（例如，如果您必须映射和取消映射文件）或者您必须更改其他共享资源（如文件描述符），您才应该查看多进程选项。