简体繁体 English

C / C ++的多线程内存分配器

[英]Multithreaded Memory Allocators for C/C++

原文 2008-09-29 02:13:02 2 8 c++/ c/ memory/ malloc/ allocation

I currently have heavily multi-threaded server application, and I'm shopping around for a good multi-threaded memory allocator. 我目前有大量的多线程服务器应用程序，而且我正在四处寻找一个好的多线程内存分配器。

So far I'm torn between: 到目前为止，我陷入了两难之间：

Sun's umem 孙s
Google's tcmalloc 谷歌的tcmalloc
Intel's threading building blocks allocator 英特尔的线程构建基块分配器
Emery Berger's hoard 埃默里·伯杰的宝藏

From what I've found hoard might be the fastest, but I hadn't heard of it before today, so I'm skeptical if its really as good as it seems. 从我发现的ho积可能是最快的，但是直到今天之前我还没有听说过它，所以我对它的真实性表示怀疑。 Anyone have personal experience trying out these allocators? 任何人都有尝试这些分配器的个人经验吗？

8 个解决方案

I've used tcmalloc and read about Hoard. 我用过tcmalloc并了解了Hoard。 Both have similar implementations and both achieve roughly linear performance scaling with respect to the number of threads/CPUs (according to the graphs on their respective sites). 两者都具有相似的实现，并且都相对于线程/ CPU的数量（根据它们各自站点上的图形）实现了大致线性的性能缩放。

So: if performance is really that incredibly crucial, then do performance/load testing. 因此：如果性能确实至关重要，那么请进行性能/负载测试。 Otherwise, just roll a dice and pick one of the listed (weighted by ease of use on your target platform). 否则，只需掷骰子，然后选择列出的骰子之一（根据目标平台的易用性进行加权）。

And from trshiv's link , it looks like Hoard, tcmalloc, and ptmalloc are all roughly comparable for speed. 从trshiv的链接来看，Hoard，tcmalloc和ptmalloc的速度都差不多。 Overall, tt looks like ptmalloc is optimized for taking as little room as possible, Hoard is optimized for a trade-off of speed + memory usage, and tcmalloc is optimized for pure speed. 总体而言，tt看起来ptmalloc已针对占用尽可能少的空间进行了优化，Hoard针对速度与内存使用之间的权衡进行了优化，而tcmalloc针对纯速度进行了优化。

The only way to really tell which memory allocator is right for your application is to try a few out. 真正判断哪种内存分配器最适合您的应用程序的唯一方法是尝试一些方法。 All of the allocators mentioned were written by smart folks and will beat the others on one particular microbenchmark or another. 提到的所有分配器都是由聪明人编写的，它们将在一个或多个特定的微基准上击败其他分配器。 If all your application does all day long is malloc one 8 byte chunk in thread A and free it in thread B, and doesn't need to handle anything else at all, you could probably write a memory allocator that beats the pants off any of those listed so far. 如果您的应用程序整天要做的事情是在线程A中分配一个8字节的块并在线程B中释放它，并且根本不需要处理其他任何事情，那么您可能会编写一个内存分配器来击败任何一个到目前为止列出的那些。 It just won't be very useful for much else. 它只是对其他很多东西没有什么用。 :) :)

I have some experience using Hoard where I work (enough so that one of the more obscure bugs addressed in the recent 3.8 release was found as a result of that experience). 我在使用Hoard的工作环境方面有一些经验（足够多，因此经验发现，在3.8版中解决的一个较模糊的错误之一）。 It's a very good allocator - but how good, for you, depends on your workload. 这是一个非常好的分配器-但是，对您而言，分配器的好坏取决于您的工作量。 And you do have to pay for Hoard (though it's not too expensive) in order to use it in a commercial project without GPL'ing your code. 而且，您必须支付Hoard的费用（尽管它并不算太贵），才能在不使用GPL代码的情况下在商业项目中使用它。

A very slightly adapted ptmalloc2 has been the allocator behind glibc's malloc for quite a while now, and so it's incredibly widely used and tested. 稍作修改的ptmalloc2一直是glibc的malloc后面的分配器，因此它得到了广泛的使用和测试。 If stability is important above all things, it might be a good choice, but you didn't mention it in your list, so I'll assume it's out. 如果最重要的是稳定性，那么它可能是一个不错的选择，但是您没有在列表中提及它，因此我认为它已经解决了。 For certain workloads, it's terrible - but the same is true of any general purpose malloc. 对于某些工作负载来说，这很可怕-但对于任何通用malloc来说都是如此。

If you're willing to pay for it (and the price is reasonable, in my experience), SmartHeap SMP is also a good choice. 如果您愿意为此付费（根据我的经验，价格合理）， SmartHeap SMP也是一个不错的选择。 Most of the other allocators mentioned are designed as drop-in malloc/free new/delete replacements that can be LD_PRELOAD'd. 提到的大多数其他分配器都设计为可通过LD_PRELOAD插入的malloc / free新/删除替换。 SmartHeap can be used that way as well, but it also includes an entire allocation-related API that lets you fine-tune your allocators to your heart's content. SmartHeap也可以以这种方式使用，但它还包括一个与分配有关的完整API，可让您根据自己的内心需求微调分配器。 In tests that we've done (again, very specific to a particular application), SmartHeap was about the same as Hoard for performance when acting as a drop-in malloc replacement; 在我们已经完成的测试中（同样，非常特定于特定的应用程序），SmartHeap在用作替代式malloc替代品时的性能与Hoard大致相同。 the real difference between the two is the degree of customization. 两者之间真正的区别在于定制程度。 You can get better performance the less general-purpose you need your allocator to be. 您不需要分配器的通用性就可以得到更好的性能。

And depending on your use case, a general-purpose multithreaded allocator might not be what you want to use at all; 而且，根据您的用例，通用多线程分配器可能根本就不是您想使用的分配器； if you're constantly malloc & free'ing objects that are all the same size, you might want to just write a simple slab allocator. 如果您不断地对大小相同的对象进行malloc和释放，则可能只想编写一个简单的slab分配器。 Slab allocation is used in several places in the Linux kernel that fit that description. Slab分配在适合该描述的Linux内核中的多个位置使用。 (I would give you a couple more useful links, but I'm a "new user" and Stack Overflow has decided that new users are not allowed to be too helpful all in one answer. Google can help out well enough, though.) （我会给您几个有用的链接，但我是“新用户”，Stack Overflow决定不允许新用户在一个答案中提供太多帮助。不过Google可以提供足够的帮助。）

I personally prefer and recommend ptmalloc as a multithreaded allocator. 我个人更喜欢并推荐ptmalloc作为多线程分配器。 Hoard is good, but in the evaluation my team did between Hoard and ptmalloc a few years ago, ptmalloc was better. Hoard很好，但是在几年前我的团队在Hoard和ptmalloc之间进行的评估中，ptmalloc更好。 From what I know, ptmalloc has been around for a number of years and is quite widely used as a multithreaded allocator. 据我所知，ptmalloc已经存在了很多年，并且被广泛用作多线程分配器。

You might find this comparison useful. 您可能会发现此比较有用。

Maybe this is the wrong way to approach what you are asking, but maybe a different tactic could be employed altogether. 也许这是处理您要问的问题的错误方法，但也许可以完全采用其他策略。 If you are looking for a really fast memory allocator maybe you should ask why you need to be spending all that time allocating memory when you could perhaps just get away with stack allocation of variables. 如果您正在寻找一个真正快速的内存分配器，也许您应该问为什么当您可能只是不用变量的堆栈分配时，为什么需要花所有这些时间分配内存。 Stack allocation, while way more annoying, done right could save you lots in the way of mutex contention, as well as keeping strange memory corruption issues out of your code. 堆栈分配虽然很烦人，但正确完成可以节省互斥锁争用的方式，并且可以避免代码中出现奇怪的内存损坏问题。 Also, you potentially have less fragmentation which could help. 此外，您可能会有更少的碎片，这可能会有所帮助。

We used hoard on a project where I worked a few years ago. 几年前，我们在一个项目中使用了hoard。 It seemed to work great. 看起来效果很好。 I have no experience iwth the other allocators. 我没有其他分配器的经验。 It should be pretty easy to try different ones and do load testing, no? 尝试不同的方法并进行负载测试应该很容易，不是吗？

The locklessinc allocator is very good and the developer is responsive if you have questions. locklessinc分配器非常好，如果您有任何疑问，开发人员会响应。 There's an article he wrote about some of the optimization tricks used, it's an interesting read: http://locklessinc.com/articles/allocator_tricks/ . 他写了一篇有关使用的一些优化技巧的文章，读起来很有趣： http : //locklessinc.com/articles/allocator_tricks/ 。 I've used it in the past with excellent results. 过去，我使用它的效果非常好。

在此处输入图片说明

Probably a late response to your question , but 对您的问题的回复可能很晚，但是

why to do mallocs if you have performance hick ups ? 如果您有性能问题，为什么还要执行malloc？

Better way would be to do a malloc of a big memory window at the initialization and then come up with a light weight Memory manager that would lease out the memory chunks at run time . 更好的方法是在初始化时对大内存窗口进行malloc，然后提出一个light weight Memory manager ，该light weight Memory manager将lease out the memory chunks at run time 。

This avoids any possibility of system calls if your heap expansion. 这样可以避免堆扩展时系统调用的任何可能性。

您可以尝试ltalloc （具有快速池分配器速度的通用全局内存分配器）。