简体繁体中英

Dealing with many large GC-eligible objects in tenured heap space

原文 2014-07-16 13:51:34 8 1 java/ garbage-collection

I have an application that produces large results objects and puts them in a queue. Multiple worker threads create the results objects and queue them, and a single writer thread de-queues the objects, converts them to CSV, and writes them to disk. Due to both I/O and the size of the results objects, writing the results takes far longer than generating them. This application is not a server, it is simply a command-line app that runs through a large batch of requests and finishes.

I would like to decrease the overall memory footprint of the application. Using a heap analysis tool ( IBM HeapAnalyzer ), I am finding that just before the program terminates, most of the large results objects are still on the heap, even though they were de-queued and have no other references to them. That is, they are all root objects. They take up the majority of the heap space.

To me, this means that they made it into tenured heap space while they were still in the queue. As no full GC is ever triggered during the run, that is where they remain. I realize that they should be tenured, otherwise I'd be copying them back and forth within the Eden spaces while they are still in the queue, but at the same time I wish there was something I could do to facilitate getting rid of them after de-queueing, short of calling System.gc() .

I realize one way of getting rid of them would be to simply shrink the maximum heap size and trigger a full GC. However the inputs to this program vary considerably in size and I would prefer to have one -Xmx setting for all runs.

Added for Clarification : this is all an issue because there is also a large memory overhead in Eden for actually writing the object out (mostly String instances, which also appear as roots in the heap analysis). There are frequent minor GC's in Eden as a result. These would be less frequent if the result objects were not hanging around in the tenured space. The argument could be made that my real problem is the output overhead in Eden, and I am working on that, but wanted to pursue this tenured issue at the same time.

As I research this, are there any particular garbage collector settings or programmatic approaches I should be focusing on? Note I am using JDK 1.8.

Answer Update : @maaartinus made some great suggestions that helped me avoid queueing (and thus tenuring) the large objects in the first place. He also suggested bounding the queue, which would surely cut down on the tenuring of what I am now queueing instead (the CSV byte[] representations of the results objects). The right mix of thread count and queue bounds will definitely help, though I have not tried this as the problem basically disappeared by finding a way to not tenure the big objects in the first place.

1 answers

I'm sceptical concerning a GC-related solution, but it looks like you're creating a problem you needn't to have:

Multiple worker threads create the results objects and queue them, and a single writer...

... writing the results takes far longer than generating them ...

So it looks like it should actually be the other way round: single producer and many consumers to keep the game even.

Multiple writers mightn't give you much speed up, but I'd try it, if possible. The number of producers doesn't matter much as long as you use a bounded queue for their results (I'm assuming they have no substantially sized input as you haven't mentioned it). This bounded queue could also ensure that the objects get never too old.

In any case, you can use multiple to CSV converters, so effectively replacing a big object by a big String or byte[] , or ByteBuffer , or whatever (assuming you want to do the conversion in memory). The nice thing about the buffer is that you can recycle it (so the fact that it gets tenured is no problem anymore).

You could also use some unmanaged memory, but I really don't believe it's necessary. Simply bounding the queue should be enough, unless I'm missing something.

^{And by the way, quite often the cheapest solution is to buy more RAM.} ^{Really, one hour of work is worth a couple of gigabytes.}

Update

how much should I be worried about contention between multiple writer threads, since they would all be sharing one thread-safe Writer?

I can imagine two kinds of problems:

Atomicity: While synchronizations ensures that each executed operations happens atomically, it doesn't mean that the output makes any sense. Imagine multiple writers, each of them generating a single CSV and the resulting file should contain all the CSVs (in any order). Using a PrintWriter would keep each line intact, but it'd intermix them.
Concurrency: For example, a FileWriter performs the conversion from char s to byte s, which may in this context end up in a synchronized block. This could reduce parallelism a bit, but as the IO seems to be the bottleneck, I guess, it doesn't matter.

Determining how many objects are eligible for GC

JAVA : To know how many objects eligible for GC

Running out of Heap/Tenured space

Which objects are eligible for GC?

count number of objects that's eligible for GC

Large Array allocation across young and tenured portions of java Heap

Why all GC eligible objects are not destroyed, once GC runs?

Hadoop: Heap space and gc problems

How many objects are eligible for garbage collector in this example?

How Many Objects Are Eligible For Garbage Collection?

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Determining how many objects are eligible for GC JAVA : To know how many objects eligible for GC Running out of Heap/Tenured space Which objects are eligible for GC? count number of objects that's eligible for GC Large Array allocation across young and tenured portions of java Heap Why all GC eligible objects are not destroyed, once GC runs? Hadoop: Heap space and gc problems How many objects are eligible for garbage collector in this example? How Many Objects Are Eligible For Garbage Collection?

Related Tags

Dealing with many large GC-eligible objects in tenured heap space

Question

1 answers

solution1 2 ACCPTED 2014-07-17 18:09:02

Update

solution1
2 ACCPTED 2014-07-17 18:09:02