简体   繁体   中英

How to analyze memory fragmentation in java?

We experience several minutes lags in our server. Probably they are triggered by "stop the world" garbage collections. But we use concurrent mark and sweep GC (-XX:+UseConcMarkSweepG) so, I think, these pauses are triggered by memory fragmentation of old generation.

How can memory fragmentation of old generation be analyzed? Are there any tools for it?

Lags happen every hour. Most time they are about 20 sec, but sometimes - several minutes.

Look at your Java documentation for the "java -X..." options for turning on GC logging. That will tell you whether you are collecting old or new generation, and how long the collections are taking.

A pause of "several minutes" sounds extraordinary. Are you sure that you aren't just running with a heap size that is too small, or on a machine with not enough physical memory?

  • If your heap too close to full, the GC will be triggered again and again, resulting in your server spending most of its CPU time in the GC. This will show up in the GC logs.

  • If you use a large heap on a machine with not enough physical memory, a full GC is liable to cause your machine to "thrash", spending most of its time madly moving virtual memory pages to and from disc. You can observe this using system monitoring tools; eg by watching the console output from "vmstat 5" on a typical UNIX/Linux system.

FOLLOWUP

Contrary to the OP's belief, turning on GC logging is unlikely to make a noticeable difference to performance.

The Understanding Concurrent Mark Sweep Garbage Collector Logs page on the Oracle site should be helpful in interpreting GC logs.

Finally, the OP's conclusion that this is a "fragmentation" problem is unlikely, and (IMO) unsupported by the snippets of evidence that he has provided. It is most likely something else.

For low-level monitoring you will want to use this -XX:PrintFLSStatistics=1 (or make it 2 for more at more blocking cost) . It's undocumented and occasionally gives you some stats. Unfortunately it's not very useful in most applications for different reasons, but it's at least ballpark-useful.

You should be able to see for example

Max Chunk Size: 215599441

and compare it to this

Total Free Space: 219955840

and then judge the fragmentation based on the average block sizes and number of blocks.

我已经使用YourKit来解决这类问题。

Vitaly, There is fragmentation problem. My observation: If there are small size of the objects which are getting updated frequently then in that case it generates lot of garbage. Though CMS collects the memory occupied by these objects, this memory is fragmented. Now Mark-Sweep-Compact thread comes into picture (stop the world)and try to compact this fragmented memory causing long pause.

Opposite to that if the objects size is bigger then it generates less fragmented memory and
Mark-Swap-Compact takes less time to compact this memory. This may cause less throughput but will help you to reduce the long pause caused by GC compaction.

This is a bit of a hard problem to find out. Since I had spend sometime in a system to find this out and prove, let me list out the scenario where this happened

  • We were stuck with using Java 6 , which did not have any compacting Garbage collector
  • Our application was doing too much GC mostly young generation collection and some big old generation collecition
  • Our heap-size was pretty big- main problem ( we reduced, but our application was guzzling on too many strings and collections)

The problem that manifested was that only one particular algorithm in our system was running slow; the rest all which were running at the same time, was running quite normally. This ruled out Full GC ; Also we were using jstat and other j** tools to check GC, thread dumps + tailing the GC logs.

From jstack thread dumps , taken for some time, we could get an idea which code block was really slowing. So the doubt fell to heap fragmentation.

To test that I wrote a simple program that initialized two List one ArrayList and one LinkedList and did add operations causing resize. This test I could execute via REST handle. Normally there is not much difference. But inside a fragmented heap there is a clear difference seen in timing; a big collection resize with ArrayList becomes very slow than with Linked list. These timings were logged, and there were no other explanation to this than a fragmented head.

With Java 7, we shifted to G1GC, along with lot of work in GC tuning and improving applications; Here heap compaction is much better and it can handle bigger heaps, though I guess anything over 16 g heap will land you in places you don't really want to go- GC suckage :)

要了解Vitaly如何处理此问题,请参阅了解并发标记扫描垃圾收集器日志

There is no memory fragmentation in Java; during the GC run, memory areas are compacted.

Since you don't see a high CPU utilization, there is no GC running, either. So something else must be the cause of your problems. Here are a few ideas:

  • If the database of your application is on a different server, there may be network problems

  • If you run Windows and you have mapped network drives, one of the drives may lock up your computer (again network problems). The same is true for NFS drives on Unix. Check the system log for network errors.

  • Is the computer swapping lots of data to disk? Since CPU util is low, the cause of the problem could be that the app was swapped to disk and the GC run forced it back into RAM. This will take a long time if your server hasn't enough real RAM to keep the whole Java app in RAM.

Also, other processes can force the app out of RAM. Check the real memory utilization and your swap space usage.

To understand the output of the GC log, this post might help.

[EDIT] I still can't get my head around "low CPU" and "GC stalls". Those two usually contradict each other. If the GC is stalling, you must see 100% CPU usage. If the CPU is idle, then something else is blocking the GC. Do you have objects which overload finalize() ? If a finalize blocks, the GC can take forever.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM