简体   繁体   中英

Java - Why does GC times increase as heap grows (in terms of capacity)?

At least in old GCs, it holds true. (I know there are new ones like ZGC and Shenandoah that aim to eliminate that)

As far as I know GC keeps tracks of living objects, so shouldn't the GC times be mostly affected by the number of objects (living/needs to be cleared)?

EDIT: I meant grows in terms of capacity, meaning bigger heap but same utilization of it by the application

Didn't you answer your own question?

As far as I know GC keeps tracks of living objects, so shouldn't the GC times be mostly affected by the number of objects (living/needs to be cleared)?

The more the heap grows, the more live objects it has, the slower the GC (I'm sure there are exceptions to this rule, in particular for minor collections, but that's the rough idea). The number of objects to be cleared is irrelevant, what matters most is the total number of live objects. Now if your heap is growing because you're storing long-lived objects, it might be ok as long as you don't keep on adding more and more of them. Eventually, long-lived objects will move towards the survivor space and will only be subject to major collections and not minor ones. As long as the minor GC always achieve sufficient memory freeing from the young generation, major GC won't be triggered on all objects (which includes long-lived ones).

I have also observed a different behaviour with G1. We had a low-latency application (40ms p99), so we attempted to configure G1 to make very short pauses (can't remember how much, maybe 5ms or so). What happened is that G1 was more or less meeting the 5ms target, but it had to run extremely frequently because 5ms was not enough to cope with all dead objects we had in our heap. Therefore, it's not exactly true to say individual garbage collection runs are going to get slower with increased heap size, however the average time spent in garbage collection in a given period of time is most likely going to increase.

There are many different algorithms that can be used to implement garbage collection. Not all of them exhibit the behaviour you mention.

In the case of your question, you are referring to algorithms that use a form of mark-sweep. If we take the HostSpot JVM as an example, the old generation can be collected using the CMS collector. This uses a marking phase, where all objects that are accessible from application code are marked. Initially, a root set of directly accessible objects (object references on the stack, registers, etc.) is created. Each object in this set has the mark-bit set in its header to indicate it is still in use. All references from these objects are recursively followed and ulitmately every accessible object has the mark-bit set. How long this takes is proportional to the number of live objects, not the size of the heap.

The sweeping phase then has to sweep throught the entire heap, looking for objects with the mark-bit set and determining the gaps between them so that they can be added to free lists. These are used to allocate space for objects being promoted from the young generation. Since the whole heap must be swept, the time this takes is proportional to the size of the heap, regardless of how much live data is in the heap.

In the case of G1, the algorithm is similar but each generation of the heap is divided into regions so that space can be reclaimed in a more efficent way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM