简体繁体 English

量化垃圾收集与显式内存管理的性能

[英]Quantifying the Performance of Garbage Collection vs. Explicit Memory Management

原文 2010-06-05 22:17:47 1 6 c#/ java/ c++/ memory/ garbage-collection

I found this article here: 我在这里找到了这篇文章：

Quantifying the Performance of Garbage Collection vs. Explicit Memory Management 量化垃圾收集与显式内存管理的性能

http://www.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf http://www.cs.umass.edu/~emery/pubs/gcvsmalloc.pdf

In the conclusion section, it reads: 在结论部分，它写道：

Comparing runtime, space consumption, and virtual memory footprints over a range of benchmarks, we show that the runtime performance of the best-performing garbage collector is competitive with explicit memory management when given enough memory. 通过比较一系列基准测试的运行时，空间消耗和虚拟内存占用，我们表明，当给定足够的内存时，性能最佳的垃圾收集器的运行时性能与显式内存管理相比具有竞争力。 In particular, when garbage collection has five times as much memory as required, its runtime performance matches or slightly exceeds that of explicit memory management. 特别是，当垃圾收集的内存是所需内存的五倍时，其运行时性能与显式内存管理的性能相匹配或稍微超过。 However, garbage collection's performance degrades substantially when it must use smaller heaps. 但是，当必须使用较小的堆时，垃圾收集的性能会大幅降低。 With three times as much memory, it runs 17% slower on average, and with twice as much memory, it runs 70% slower. 内存的三倍，平均运行速度慢17％， 内存增加一倍，运行速度减慢70％。 Garbage collection also is more susceptible to paging when physical memory is scarce. 当物理内存不足时，垃圾收集也更容易被分页。 In such conditions, all of the garbage collectors we examine here suffer order-of-magnitude performance penalties relative to explicit memory management. 在这种情况下，我们在这里检查的所有垃圾收集器都会受到相对于显式内存管理的数量级性能损失。

So, if my understanding is correct: if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (ie garbage collector based) language (eg Java, C#), the app should require 5*100 MB = 500 MB? 所以，如果我的理解是正确的：如果我有一个用本机C ++编写的应用程序需要100 MB内存，要实现与“托管”（即基于垃圾收集器）语言（例如Java，C＃）相同的性能，应用程序应该要求5 * 100 MB = 500 MB？ (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?) （并且2 * 100 MB = 200 MB，托管应用程序比本机应用程序运行速度慢70％？）

Do you know if current (ie latest Java VM's and .NET 4.0's) garbage collectors suffer the same problems described in the aforementioned article? 您是否知道当前（即最新的Java VM和.NET 4.0）垃圾收集器是否遇到上述文章中描述的相同问题？ Has the performance of modern garbage collectors improved? 现代垃圾收集器的性能有所改善吗？

Thanks. 谢谢。

6 个解决方案

if I have an app written in native C++ requiring 100 MB of memory, to achieve the same performance with a "managed" (ie garbage collector based) language (eg Java, C#), the app should require 5*100 MB = 500 MB? 如果我有一个用本机C ++编写需要100 MB内存的应用程序来实现与“托管”（即基于垃圾收集器的）语言（例如Java，C＃）相同的性能，应用程序应该需要5 * 100 MB = 500 MB ？ (And with 2*100 MB = 200 MB, the managed app would run 70% slower than the native app?) （并且2 * 100 MB = 200 MB，托管应用程序比本机应用程序运行速度慢70％？）

Only if the app is bottlenecked on allocating and deallocating memory. 仅当应用程序在分配和释放内存时遇到瓶颈。 Note that the paper talks exclusively about the performance of the garbage collector itself. 请注意，该文件专门讨论了垃圾收集器本身的性能。

You seem to be asking two things: 你似乎在问两件事：

have GC's improved since that research was performed, and 自从进行了研究以来GC的改进了
can I use the conclusions of the paper as a formula to predict required memory. 我可以使用论文的结论作为预测所需记忆的公式。

The answer to the first is that there have been no major breakthroughs in GC algorithms that would invalidate the general conclusions: 第一个问题的答案是GC算法没有重大突破会使一般结论无效：

GC'ed memory management still requires significantly more virtual memory. GC的内存管理仍然需要更多的虚拟内存。
If you try to constrain the heap size the GC performance drops significantly. 如果您尝试约束堆大小，GC性能会显着下降。
If real memory is restricted, the GC'ed memory management approach results in substantially worse performance due to paging overheads. 如果实际内存受到限制，则由于分页开销，GC的内存管理方法会导致性能大大降低。

However, the conclusions cannot really be used as a formula: 但是，结论不能真正用作公式：

The original study was done with JikesRVM rather than a Sun JVM. 最初的研究是使用JikesRVM而不是Sun JVM完成的。
The Sun JVM's garbage collectors have improved in the ~5 years since the study. 自研究以来，Sun JVM的垃圾收集器在5年内得到了改进。
The study does not seem to take into account that Java data structures take more space than equivalent C++ data structures for reasons that are not GC related. 该研究似乎没有考虑到Java数据结构比同等C ++数据结构占用更多空间，原因与GC无关。

On the last point, I have seen a presentation by someone that talks about Java memory overheads. 最后一点，我看过有人谈论Java内存开销的演讲。 For instance, it found that the minimum representation size of a Java String is something like 48 bytes. 例如，它发现Java String的最小表示大小类似于48个字节。 (A String consists of two primitive objects; one an Object with 4 word-sized fields and the other an array with a minimum of 1 word of content. Each primitive object also has 3 or 4 words of overhead.) Java collection data structures similarly use far more memory than people realize. （一个String由两个原始对象组成;一个是一个具有4个字大小字段的对象，另一个是一个至少包含1个字的内容的数组。每个原始对象也有3或4个字的开销。）Java集合数据结构类似使用比人们意识到的更多的记忆。

These overheads are not GC-related per se . 这些开销本身与GC无关。 Rather they are direct and indirect consequences of design decisions in the Java language, JVM and class libraries. 相反，它们是Java语言，JVM和类库中设计决策的直接和间接后果。 For example: 例如：

Each Java primitive object header ¹ reserves one word for the object's "identity hashcode" value, and one or more words for representing the object lock. 每个Java原始对象标题¹为对象的“标识哈希码”值保留一个字，并且为表示对象锁定保留一个或多个字。
The representation of a String has to use a separate "array of characters" because of JVM limitations. 由于JVM的限制，String的表示必须使用单独的“字符数组”。 Two of the three other fields are an attempt to make the substring operation less memory intensive. 其他三个字段中的两个是尝试使substring操作减少内存密集。
The Java collection types use a lot of memory because collection elements cannot be directly chained. Java集合类型使用大量内存，因为集合元素不能直接链接。 So for example, the overheads of a (hypothetical) singly linked list collection class in Java would be 6 words per list element. 因此，例如，Java中（假设的）单链接列表集合类的开销将是每个列表元素6个字。 By contrast an optimal C/C++ linked list (ie with each element having a "next" pointer) has an overhead of one word per list element. 相反，最佳C / C ++链表（即每个元素具有“下一个”指针）具有每个列表元素一个字的开销。

^{1 - In fact, the overheads are less than this on average.} ^{1 - 实际上，开销平均低于此。} ^{The JVM only "inflates" a lock following use & contention, and similar tricks are used for the identity hashcode.} ^{JVM仅在使用和争用后“膨胀”锁，并且类似的技巧用于身份哈希码。} ^{The fixed overhead is only a few bits.} ^{固定开销只有几位。} ^{However, these bits add up to a measurably larger object header ... which is the real point here.} ^{然而，这些位累加到一个可测量的更大的对象标题......这是真正的要点。}

Michael Borgwardt is kind of right about if the application is bottlenecked on allocating memory. 如果应用程序在分配内存方面遇到瓶颈，那么Michael Borgwardt是正确的。 This is according to Amdahl's law. 这符合阿姆达尔定律。

However, I have used C++, Java, and VB .NET. 但是，我使用过C ++，Java和VB .NET。 In C++ there are powerful techniques available that allocate memory on the stack instead of the heap. 在C ++中，有一些强大的技术可用于在堆栈而不是堆上分配内存。 Stack allocation is easily a hundreds of times faster than heap allocation. 堆栈分配很容易比堆分配快几百倍。 I would say that use of these techniques could remove maybe one allocation in eight, and use of writable strings one allocation in four. 我会说使用这些技术可以删除八个中的一个分配，并使用四个分配的可写字符串。

It's no joke when people claim highly optimized C++ code can trounce the best possible Java code. 当人们声称高度优化的C ++代码可以破坏最好的Java代码时，这不是开玩笑。 It's the flat out truth. 这是彻头彻尾的事实。

Microsoft claims the overhead in using any of the .NET family of languages over C++ is about two to one. 微软声称，使用任何.NET系列语言而不是C ++的开销大约是二比一。 I believe that number is just about right for most things. 我相信这个数字对于大多数事情来说都是正确的。

HOWEVER, managed environments carry a particular benefit in that when dealing with inferior programmers you don't have to worry about one module trashing another module's memory and the resulting crash being blamed on the wrong developer and the bug difficult to find. 然而，托管环境带来了特别的好处，因为在与劣质程序员打交道时，您不必担心一个模块会破坏另一个模块的内存，并且由此导致的崩溃归咎于错误的开发人员并且很难找到错误。

At least as I read it, your real question is whether there have been significant developments in garbage collection or manual memory management since that paper was published that would invalidate its results. 至少在我读到它时，你真正的问题是垃圾收集或手动内存管理是否有重大进展，因为该论文的发布将使其结果无效。 The answer to that is somewhat mixed. 对此的答案有点混乱。 On one hand, the vendors who provide garbage collectors do tune them so their performance tends to improve over time. 一方面，提供垃圾收集器的供应商会调整它们，因此它们的性能会随着时间的推移而改善。 On the other hand, there hasn't been anything like a major breakthroughs such as major new garbage collection algorithms. 另一方面，没有像重大新垃圾收集算法这样的重大突破。

Manual heap managers generally improve over time as well. 手动堆管理器通常也随着时间的推移而改进。 I doubt most are tuned with quite the regularity of garbage collectors, but in the course of 5 years, probably most have had at least a bit of work done. 我怀疑大多数人都对垃圾收集器的规律性进行了调整，但在5年的时间里，大多数人都至少完成了一些工作。

In short, both have undoubtedly improved at least a little, but in neither case have there been major new algorithms that change the fundamental landscape. 简而言之，两者无疑都至少有所改善，但两种情况都没有改变基本面的重要新算法。 It's doubtful that current implementations will give a difference of exactly 17% as quoted in the article, but there's a pretty good chance that if you repeated the tests today, you'd still get a difference somewhere around 15-20% or so. 令人怀疑的是，目前的实施方案会产生文章中引用的17％的差异，但是如果你今天重复测试，那么你很可能会在15-20％左右的时间内获得差异。 The differences between then and now are probably smaller than the differences between some of the different algorithms they tested at that time. 当时和现在之间的差异可能小于他们当时测试的一些不同算法之间的差异。

I am not sure how relivent your question still is today. 我不确定你今天的问题是多么的重要。 A performance critical application shouldn't spend a sigificant portion of its time doing object creation (as the micro-benchmark is very likely to do) and the performance on modern systems is more likely to be determined by how well the application fits into the CPUs cache, rather than how much main memory it uses. 性能关键型应用程序不应该花费大量时间来创建对象（因为微基准测试非常可能），现代系统的性能更可能取决于应用程序与CPU的匹配程度。缓存，而不是它使用多少主内存。

BTW: There are lots of ticks you can do in C++ which support this which are not available in Java. 顺便说一句：你可以在C ++中做很多滴答，它们支持Java，这是Java中没有的。

If you are worried about the cost of GC or object creation, you can take steps to minimise how many objects you create. 如果您担心GC或对象创建的成本，您可以采取措施来最小化您创建的对象数量。 This is generally a good idea where performance is critical in any language. 这通常是一个好主意，在任何语言中性能都至关重要。

The cost of main memory isn't as much of an issue as it used to me. 主内存的成本并不像我以前那么大。 A machine with 48 GB is relatively cheap these days. 目前，48 GB的机器相对便宜。 An 8 core server with 48 GB of main memory can be leased for £9/day. 一个8核服务器，48 GB的主内存可以租用9英镑/天。 Try hiring a developer for £9/d. 尝试以9英镑/天的价格聘请开发人员。 ;) However, what is still relatively expensive is CPU cache memory. ;）然而，仍然相对昂贵的是CPU缓存。 It is fairly hard to find a system with more than 16 MB of CPU cache. 找到一个超过16 MB CPU缓存的系统是相当困难的。 cf 48,000 MB of main memory. cf 48,000 MB的主内存。 A system performs much better when an application is using its CPU cache and this is the amount of memory to consider if performance is critical. 当应用程序使用其CPU缓存时，系统执行得更好，这是性能至关重要时要考虑的内存量。

First note that its now 2019 and a lot of things has improved. 首先要注意的是，它现在2019年和很多东西都有所改善。 As long as you dont trigger GC, allocation would be like as simple as incrementing a pointer. 只要你不触发GC，分配就像增加指针一样简单。 In C++ its much more if you dont implement your own mechanism to allocate in chunks. 在C ++中，如果你不实现自己的机制来分配块，那就更多了。 And if you use smart shared pointers each change to refercence count will required locked increment (xaddl instruction) is slow itself and requires processors communicate to invalidate and resynch their cacheline. 如果您使用智能共享指针，则每次更改为引用计数都需要锁定增量（xaddl指令）本身很慢并且需要处理器进行通信以使其高速缓存行无效并重新同步。 What is more, with GC you get more locality with at least three ways. 更重要的是，使用GC，您可以通过至少三种方式获得更多地点。 First when it allocates a new segment, it zero's memory and warms cachelines. 首先，当它分配一个新的段时，它的内存为零，并使高速缓存行变暖。 Second it compacts heap and cause data to stay closer togeter and lastly all threads use its own heap. 第二，它压缩堆并使数据更接近于更改，最后所有线程都使用自己的堆。 In conclusion, although its hard to test and compare with every scenario and GC implementation ive read somewhere on SO that its proven GC performs better than manual memory management. 总而言之，虽然很难测试和比较每个场景和GC实现我已经在SO上读到它已经过验证的GC比手动内存管理更好。