简体   繁体   中英

Garbage Collection settings with OpenJDK8

I need help tuning one of our Microservices.

we are running a Spring based Microservice (Spring Integration, Spring Data JPA) on a jetty server in an OpenJDK8 Container. We are also using Mesosphere as our Container Orchestrating platform.

The application consumes messages from IBM MQ, does some processing and then stores the processed output in an Oracle DB.

We noticed that at some point on the 2nd of May that the queue processing stopped from our application. Our MQ team could still see that there were open connections against the queue, but the application was just not reading anymore. It did not die totally, as the healthCheck Api that DCOS hits still shows as healthy.

在此处输入图片说明 We use AppD for performance monitoring and what we could see is that on the same date there was a garbage collection done and from there the application never picked up messages from the queue. The graph above shows the amount of time spent doing GC on the different dates.

As part of the Java Opts we use to run the application we state

-Xmx1024m

The Mesosphere reservation for each of that Microservice is as shown below

在此处输入图片说明

Can someone please point me in the right direction to configure the right settings for Garbage Collection for my application.

Also, if you think that the GC is just a symptom, thanks for sharing your views on potential flaws I should be looking for.

Cheers Kris

You should check up your code.

A GC operation will trigger a STW(Stop The World) operation which will block all the thread created in your code. But STW dosen't affect the code run state.

But gc will affect your code logic if you use such as System.currentTimeMillis to control you code run logic.

A gc operation will also effect the non-strong reference, if you're use WeakReference, SoftReference, WeakHashMap, after a full gc, these component may change their behavir.

A full gc operation is done,and freed memory dosen't allow your code to allocate new Object,your code will throw a 'OutOfMembryException' which will interrupt your code execution.

I think the things you should do now is:

First, check up the 'GC Cause', to determine if the full gc happend in System.gc() call or Allocate failed .

Then, if GC Cause is System.gc() , your should check up the non-strong reference used in your code.

Finally, if GC cause is Allocate failed , you should check up your log to determine weather there happend a OutOfMembryException in you code, if happend, you should allocate more memory to avoid OutOfMembryException .

As a suggestion, You SHOULD NOT keep your mq message in your microservice application memory. Mostlly, the source of gc problem is bad practice in your code.

I don't think that garbage collection is at fault here, or that you should be attempting to fix this by tweaking GC parameters.

I think it is one of two things:

  1. A coincidence. A correlation (for a single data point) that doesn't imply causation.

  2. Something about garbage collection, or the event that triggered the garbage collection has caused something to break in your application.

For the latter, there are any number of possibilities. But one that springs to mind is that something (eg a request) caused an application thread to allocate a really large object. That triggered a full GC in an attempt to find space. The GC failed; ie there still wasn't enough space after the GC did its best. That then turned into an OOME which killed the thread.

If the (hypothetical) thread that was killed by the OOME was critical to the operation application, AND the rest of the application didn't "notice" it had died, then the application as a whole would break.

One clue to look for would be an OOME logged when the thread died. But it is also possible (if the application is not written / configured appropriately) for the OOME not to appear in the logs.

Regarding the ApppD chart? Is that time in seconds? How many Full GCs do you have? Perhaps you should enable the log for the garbage collector.

Thanks for your contribution guys. We will be attempting to increase the CPU allocation from 0.5 CPU to 1.25 CPU, and execute another round of NFT tests.

We tried running the command below

jmap -dump:format=b,file=$FILENAME.bin $PID

to get a heap dump, but the utility is not present on the default OpenJDK8 container.

I have just seen your comments about CPU

increase the CPU allocation from 0.5 CPU to 1.25 CPU

Please, keep in mind that in order to execute the parallel GC you need at least two cores. I think with your configuration you are using serial collector and there is no reason to use a serial garbage collector nowadays when you can leverage the use of multiple cores. Have you consider trying at least two cores? I often use four as a minimum number for my application servers on production and performance.

You can see more information here:

On a machine with N hardware threads where N is greater than 8, the parallel collector uses a fixed fraction of N as the number of garbage collector threads. The fraction is approximately 5/8 for large values of N. At values of N below 8, the number used is N. On selected platforms, the fraction drops to 5/16. The specific number of garbage collector threads can be adjusted with a command-line option (which is described later). On a host with one processor, the parallel collector will likely not perform as well as the serial collector because of the overhead required for parallel execution (for example, synchronization) . However, when running applications with medium-sized to large-sized heaps, it generally outperforms the serial collector by a modest amount on machines with two processors, and usually performs significantly better than the serial collector when more than two processors are available.

Source: https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/parallel.html

Raúl

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM