简体   繁体   English

OpenJDK8的垃圾收集设置

[英]Garbage Collection settings with OpenJDK8

I need help tuning one of our Microservices. 我需要帮助来调整我们的微服务之一。

we are running a Spring based Microservice (Spring Integration, Spring Data JPA) on a jetty server in an OpenJDK8 Container. 我们正在OpenJDK8容器中的码头服务器上运行基于Spring的微服务(Spring Integration,Spring Data JPA)。 We are also using Mesosphere as our Container Orchestrating platform. 我们还将Mesosphere用作容器编排平台。

The application consumes messages from IBM MQ, does some processing and then stores the processed output in an Oracle DB. 该应用程序使用来自IBM MQ的消息,进行一些处理,然后将处理后的输出存储在Oracle DB中。

We noticed that at some point on the 2nd of May that the queue processing stopped from our application. 我们注意到在5月2日的某个时候,队列处理从我们的应用程序中停止了。 Our MQ team could still see that there were open connections against the queue, but the application was just not reading anymore. 我们的MQ团队仍然可以看到队列中有打开的连接,但是应用程序不再读取了。 It did not die totally, as the healthCheck Api that DCOS hits still shows as healthy. 它并没有完全死亡,因为DCOS击中的healthCheck Api仍然显示健康。

在此处输入图片说明 We use AppD for performance monitoring and what we could see is that on the same date there was a garbage collection done and from there the application never picked up messages from the queue. 我们使用AppD进行性能监控,我们可以看到,在同一天完成了垃圾回收,并且应用程序从那里从未从队列中接收消息。 The graph above shows the amount of time spent doing GC on the different dates. 上图显示了在不同日期进行GC所花费的时间。

As part of the Java Opts we use to run the application we state 作为Java Opts的一部分,我们用来运行我们声明的应用程序

-Xmx1024m

The Mesosphere reservation for each of that Microservice is as shown below 每个微服务的Mesosphere预留如下所示

在此处输入图片说明

Can someone please point me in the right direction to configure the right settings for Garbage Collection for my application. 有人可以为我指出正确的方向,以便为我的应用程序配置垃圾收集的正确设置。

Also, if you think that the GC is just a symptom, thanks for sharing your views on potential flaws I should be looking for. 另外,如果您认为GC只是一种症状,感谢您就我应该寻找的潜在缺陷分享您的观点。

Cheers Kris 干杯克里斯

You should check up your code. 您应该检查您的代码。

A GC operation will trigger a STW(Stop The World) operation which will block all the thread created in your code. GC操作将触发STW(Stop The World)操作,该操作将阻止代码中创建的所有线程。 But STW dosen't affect the code run state. 但是STW不会影响代码运行状态。

But gc will affect your code logic if you use such as System.currentTimeMillis to control you code run logic. 但是,如果您使用诸如System.currentTimeMillis代码来控制代码运行逻辑,则gc将影响您的代码逻辑。

A gc operation will also effect the non-strong reference, if you're use WeakReference, SoftReference, WeakHashMap, after a full gc, these component may change their behavir. 如果您使用的是WeakReference,SoftReference,WeakHashMap,则gc操作也会影响非严格的引用,在完整的gc之后,这些组件可能会更改其行为。

A full gc operation is done,and freed memory dosen't allow your code to allocate new Object,your code will throw a 'OutOfMembryException' which will interrupt your code execution. 完成了完整的gc操作,释放的内存不允许您的代码分配新的Object,您的代码将引发“ OutOfMembryException”,这将中断您的代码执行。

I think the things you should do now is: 我认为您现在应该做的事情是:

First, check up the 'GC Cause', to determine if the full gc happend in System.gc() call or Allocate failed . 首先,检查“ GC原因”,以确定是否在System.gc()调用中发生了完整的gc或“ Allocate failed

Then, if GC Cause is System.gc() , your should check up the non-strong reference used in your code. 然后,如果GC Cause是System.gc() ,则应检查代码中使用的非强引用。

Finally, if GC cause is Allocate failed , you should check up your log to determine weather there happend a OutOfMembryException in you code, if happend, you should allocate more memory to avoid OutOfMembryException . 最后,如果GC原因是Allocate failed ,则应检查日志以确定天气代码中发生OutOfMembryException ,如果发生,则应分配更多内存以避免OutOfMembryException

As a suggestion, You SHOULD NOT keep your mq message in your microservice application memory. 作为建议,您不应该将mq消息保留在微服务应用程序内存中。 Mostlly, the source of gc problem is bad practice in your code. 通常,gc问题的根源是您代码中的错误做法。

I don't think that garbage collection is at fault here, or that you should be attempting to fix this by tweaking GC parameters. 我不认为这里的垃圾收集有问题,或者您应该尝试通过调整GC参数来解决此问题。

I think it is one of two things: 我认为这是两件事之一:

  1. A coincidence. 巧合。 A correlation (for a single data point) that doesn't imply causation. 不暗示因果关系的关联(针对单个数据点)。

  2. Something about garbage collection, or the event that triggered the garbage collection has caused something to break in your application. 有关垃圾回收的信息,或触发垃圾回收的事件已导致应用程序中断。

For the latter, there are any number of possibilities. 对于后者,有许多可能性。 But one that springs to mind is that something (eg a request) caused an application thread to allocate a really large object. 但是我想到的是,某些事情(例如请求)导致应用程序线程分配了一个非常大的对象。 That triggered a full GC in an attempt to find space. 那触发了一个完整的GC试图寻找空间。 The GC failed; GC失败; ie there still wasn't enough space after the GC did its best. 也就是说,GC尽了最大努力后,仍然没有足够的空间。 That then turned into an OOME which killed the thread. 然后,这变成了一个杀死线程的OOME。

If the (hypothetical) thread that was killed by the OOME was critical to the operation application, AND the rest of the application didn't "notice" it had died, then the application as a whole would break. 如果被OOME杀死的(假想的)线程对于操作应用程序至关重要,而应用程序的其余部分没有“通知”它已经死亡,则整个应用程序将崩溃。

One clue to look for would be an OOME logged when the thread died. 寻找线索的一个线索是线程死亡时记录的OOME。 But it is also possible (if the application is not written / configured appropriately) for the OOME not to appear in the logs. 但是(如果未正确编写/配置应用程序)OOME也可能不会出现在日志中。

Regarding the ApppD chart? 关于ApppD图表? Is that time in seconds? 那是几秒钟的时间吗? How many Full GCs do you have? 您有多少个完整GC? Perhaps you should enable the log for the garbage collector. 也许您应该启用垃圾收集器的日志。

Thanks for your contribution guys. 感谢您的贡献。 We will be attempting to increase the CPU allocation from 0.5 CPU to 1.25 CPU, and execute another round of NFT tests. 我们将尝试将CPU分配从0.5 CPU增加到1.25 CPU,并执行另一轮NFT测试。

We tried running the command below 我们尝试运行以下命令

jmap -dump:format=b,file=$FILENAME.bin $PID

to get a heap dump, but the utility is not present on the default OpenJDK8 container. 获取堆转储,但是默认的OpenJDK8容器中没有该实用程序。

I have just seen your comments about CPU 我刚刚看到了您对CPU的评论

increase the CPU allocation from 0.5 CPU to 1.25 CPU 将CPU分配从0.5 CPU增加到1.25 CPU

Please, keep in mind that in order to execute the parallel GC you need at least two cores. 请记住,要执行并行GC,至少需要两个内核。 I think with your configuration you are using serial collector and there is no reason to use a serial garbage collector nowadays when you can leverage the use of multiple cores. 我认为在您的配置中,您正在使用串行收集器,并且当您可以利用多个内核时,现在没有理由使用串行垃圾收集器。 Have you consider trying at least two cores? 您是否考虑过尝试至少两个核心? I often use four as a minimum number for my application servers on production and performance. 在生产和性能方面,我经常为我的应用程序服务器使用最少四个数字。

You can see more information here: 您可以在此处查看更多信息:

On a machine with N hardware threads where N is greater than 8, the parallel collector uses a fixed fraction of N as the number of garbage collector threads. 在具有N个大于8的N个硬件线程的机器上,并行收集器使用N的固定部分作为垃圾收集器线程的数量。 The fraction is approximately 5/8 for large values of N. At values of N below 8, the number used is N. On selected platforms, the fraction drops to 5/16. 对于较大的N值,分数约为5/8。在N的值小于8时,使用的数字为N。在选定的平台上,分数下降为5/16。 The specific number of garbage collector threads can be adjusted with a command-line option (which is described later). 垃圾回收器线程的特定数量可以使用命令行选项进行调整(稍后说明)。 On a host with one processor, the parallel collector will likely not perform as well as the serial collector because of the overhead required for parallel execution (for example, synchronization) . 在具有一个处理器的主机上,由于并行执行(例如,同步)所需的开销,并行收集器的性能可能不如串行收集器 However, when running applications with medium-sized to large-sized heaps, it generally outperforms the serial collector by a modest amount on machines with two processors, and usually performs significantly better than the serial collector when more than two processors are available. 但是,当运行具有中型到大型堆的应用程序时,在具有两个处理器的机器上,它通常比串行收集器的性能要适度,并且通常在两个以上处理器可用时的性能要明显好于串行收集器。

Source: https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/parallel.html 来源: https : //docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/parallel.html

Raúl 劳尔

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM