简体繁体 English

如何通过负载测试优化JVM和GC

[英]How to Optimize JVM & GC through Load Testing

原文 2012-01-05 16:09:37 9 3 java/ optimization/ garbage-collection/ jvm/ load-testing

Edit : Of the several extremely generous and helpful responses this question has already received, it is obvious to me that I didn't make an important part of this question clear when I asked it earlier this morning. 编辑：在这个问题已经收到的几个非常慷慨和有帮助的回答中，很明显，当我今天早些时候提出这个问题时，我没有把这个问题的重要部分弄清楚。 The answers I've received so far are more about optimizing applications & removing bottlenecks at the code level. 到目前为止，我收到的答案更多的是关于优化应用程序和消除代码级别的瓶颈。 I am aware that this is way more important than trying to get an extra 3- or 5% out of your JVM! 我知道这比尝试从JVM中获得额外的3％或5％更重要！

This question assumes we've already done just about everything we could to optimize our application architecture at the code level. 这个问题假设我们已经完成了在代码级别优化应用程序架构的所有工作。 Now we want more, and the next place to look is at the JVM level and garbage collection; 现在我们想要更多，下一个要看的是JVM级别和垃圾收集; I've changed the question title accordingly. 我已相应更改了问题标题。 Thanks again! 再次感谢！

We've got a "pipeline" style backend architecture where messages pass from one component to the next, with each component performing different processes at each step of the way. 我们有一个“管道”式后端架构，消息从一个组件传递到下一个组件，每个组件在每个步骤执行不同的过程。

Components live inside of WAR files deployed on Tomcat servers. 组件存在于Tomcat服务器上部署的WAR文件中。 Altogether we have about 20 components in the pipeline, living on 5 different Tomcat servers (I didn't choose the architecture or the distribution of WARs for each server). 总共有大约20个组件在管道中，存在于5个不同的Tomcat服务器上（我没有选择每个服务器的体系结构或WAR分布）。 We use Apache Camel to create all the routes between the components, effectively forming the "connective tissue" of the pipeline. 我们使用Apache Camel创建组件之间的所有路径，有效地形成管道的“结缔组织”。

I've been asked to optimize the GC and general performance of each server running a JVM (5 in all). 我被要求优化运行JVM的每个服务器的GC和一般性能（总共5个）。 I've spent several days now reading up on GC and performance tuning, and have a pretty good handle on what each of the different JVM options do, how the heap is organized, and how most of the options affect the overall performance of the JVM. 我花了几天时间阅读GC和性能调优，并且很好地处理了每个不同的JVM选项的作用，堆的组织方式以及大多数选项如何影响JVM的整体性能。

My thinking is that the best way to optimize each JVM is not to optimize it as a standalone. 我的想法是，优化每个JVM的最佳方法不是将其优化为独立的。 I "feel" (that's about as far as I can justify it!) that trying to optimize each JVM locally without considering how it will interact with the other JVMs on other servers (both upstream and downstream) will not produce a globally-optimized solution. 我觉得“（我认为可以证明这一点！”）尝试在本地优化每个JVM而不考虑它将如何与其他服务器（上游和下游）上的其他JVM进行交互将不会产生全局优化的解决方案。

To me it makes sense to optimize the entire pipeline as a whole. 对我来说，从整体上优化整个管道是有意义的。 So my first question is: does SO agree, and if not, why? 所以我的第一个问题是： SO是否同意，如果没有，为什么？

To do this, I was thinking about creating a LoadTester that would generate input and feed it to the first endpoint in the pipeline. 为此，我考虑创建一个LoadTester ，它将生成输入并将其提供给管道中的第一个端点。 This LoadTester might also have a separate " Monitor Thread " that would check the last endpoint for throughput. 此LoadTester可能还有一个单独的“ 监视器线程 ”，它将检查最后一个端点的吞吐量。 I could then do all sorts of processing where we check for average end-to-end travel time for messages, maximum throughput before faulting, etc. 然后，我可以进行各种处理，检查消息的平均端到端传播时间，断层前的最大吞吐量等。

The LoadTester would generate the same pattern of input messages over and over again. LoadTester会一遍又一遍地生成相同的输入消息模式。 The variable in this experiment would be the JVM options passed to each Tomcat server's startup options. 此实验中的变量将是传递给每个Tomcat服务器的启动选项的JVM选项。 I have a list of about 20 different options I'd like to pass the JVMs, and figured I could just keep tweaking their values until I found near-optimal performance. 我有一个大约20种不同选项的列表，我想通过JVM，并认为我可以继续调整它们的值，直到我找到接近最佳的性能。

This may not be the absolute best way to do this, but it's the best way I could design with what time I've been given for this project (about a week). 这可能不是绝对最好的方法，但这是我设计的最佳方式，我已经为这个项目提供了大约一个星期的时间（大约一周）。

Second question: what does SO think about this setup? 第二个问题： SO对此设置有何看法？ How would SO create an "optimizing solution" any differently? SO如何以不同的方式创建“优化解决方案”？

Last but not least, I'm curious as to what sort of metrics I could use as a basis of measure and comparison. 最后但同样重要的是，我很好奇我可以使用什么样的指标作为衡量和比较的基础。 I can really only think of: 我真的只能想到：

Find the JVM option config that produces the fastest average end-to-end travel time for messages 找到JVM选项配置，它可以为消息生成最快的平均端到端旅行时间
Find the JVM option config that produces the largest volume throughput without crashing any of the servers 找到生成最大卷吞吐量的JVM选项配置，而不会导致任何服务器崩溃

Any others? 还有其他人？ Any reasons why those 2 are bad? 这2个坏的原因是什么？

After reviewing the play I could see how this might be construed as a monolithic question, but really what I'm asking is how SO would optimize JVMs running along a pipeline, and to feel free to cut-and-dice my solution however you like it. 在回顾了这个剧本之后，我可以看到这可能被视为一个单一的问题，但我真正要问的是如何优化JVM在管道中运行，并随意切割我的解决方案然而你喜欢它。

Thanks in advance! 提前致谢！

3 个解决方案

Let me go up a level and say I did something similar in a large C app many years ago. 让我上一层，说我多年前在一个大型C应用程序中做了类似的事情。 It consisted of a number of processes exchanging messages across interconnected hardware. 它由许多进程组成，这些进程通过互连的硬件交换消息。 I came up with a two-step approach. 我提出了两步法。

Step 1. Within each process, I used this technique to get rid of any wasteful activities. 步骤1.在每个过程中，我使用这种技术来摆脱任何浪费的活动。 That took a few days of sampling, revising code, and repeating. 这需要几天的采样，修改代码，然后重复。 The idea is there is a chain, and the first thing to do is remove inefficiences from the links. 这个想法是有一个链条，首先要做的是从链接中删除低效率。

Step 2. This part is laborious but effective: Generate time-stamped logs of message traffic. 步骤2.此部分费力但有效：生成消息流量的带时间戳的日志。 Merge them together into a common timeline. 将它们合并为一个共同的时间表。 Look carefully at specific message sequences. 仔细查看特定的消息序列。 What you're looking for is 你在寻找什么

Was the message necessary, or was it a retransmission resulting from a timeout or other avoidable reason? 消息是必要的，还是由于超时或其他可避免的原因导致的重传？
When was the message sent, received, and acted upon? 消息何时被发送，接收和采取行动？ If there is a significant delay between being received and acted upon, what is the reason for that delay? 如果收到和采取行动之间存在重大延迟，那么延迟的原因是什么？ Was it just a matter of being "in line" behind another process that was doing I/O, for example? 例如，只是在另一个正在进行I / O的过程中“排队”的问题是什么？ Could it have been fixed with different process priorities? 是否可以通过不同的流程优先级进行修复？

This activity took me about a day to generate logs, combine them, find a speedup opportunity, and revise code. 这项活动花了我一天的时间来生成日志，组合它们，找到加速机会，并修改代码。 At this rate, after about 10 working days, I had found/fixed a number of problems, and improved the speed dramatically . 按照这一速度，后10个工作日左右，我找到了/固定的一些问题，并大大提高了速度。

What is common about these two steps is I'm not measuring or trying to get "statistics". 这两个步骤的共同点是我没有测量或试图得到“统计数据”。 If something is spending too much time, that very fact exposes it to a dilligent programmer taking a close meticulous look at what is happening. 如果有什么东西花了太多时间，那么这个事实就会让一个狡猾的程序员仔细看看正在发生的事情。

I would start with finding the optimum recommended jvm values specified for your hardware/software mix OR just start with what is already out there. 我将首先找到为您的硬件/软件组合指定的最佳推荐jvm值，或者从已经存在的内容开始。

Next I would make sure that I have monitoring in place to measure Business throughputs and SLAs 接下来，我将确保我有适当的监控来衡量业务吞吐量和SLA

I would not try to tweak just the GC if there is no reason to. 如果没有理由，我不会尝试只调整GC。

First you will need to find what are the major bottlenecks in your application. 首先，您需要找到应用程序中的主要瓶颈。 If it is I/O bound, SQL bound etc. 如果它是I / O绑定，SQL绑定等。

Key here is to MEASURE, IDENTIFY TOP bottlenecks, FIX them and conduct another iteration with a repeatable load. 这里的关键是测量，识别TOP瓶颈，修复它们并以可重复的负载进行另一次迭代。

HTH... HTH ...

The biggest trick I am aware of when running multiple JVMs on the same machine is limiting the number of core the GC will use. 在同一台机器上运行多个JVM时，我所知道的最大诀窍是限制GC将使用的核心数量。 Otherwise what can happen when one JVM does a full GC is it will attempt to grab every core, impacting the performance of all the JVMs even though they are not performing a GC. 否则当一个JVM执行完整GC时会发生什么情况，它会尝试抓取每个核心，影响所有JVM的性能，即使它们没有执行GC。 One suggestion is to limit the number of gc threads to 5/8 or less. 一个建议是将gc线程的数量限制为5/8或更少。 (I can't remember where it is written) （我不记得它写在哪里）

I think you should test the system as a whole to ensure you have realistic interaction between the services. 我认为你应该对系统进行整体测试，以确保服务之间的实际交互。 However, I would assume you may need to tune each service differently. 但是，我认为您可能需要以不同方式调整每项服务。

Changing command line options is useful if you cannot change the code. 如果无法更改代码，则更改命令行选项很有用。 However if you profile and optimise the code you can make far for difference than tuning the GC parameters (in which cause you need to change them again) 但是，如果您对代码进行分析和优化，那么除了调整GC参数之外，您可以做出很大的改变（因为您需要再次更改它们）

For this reason, I would only change the command line parameters as a last resort, after you there is little improvement which can be made in the code of the application. 出于这个原因，我只会将命令行参数更改为最后的手段，之后您可以在应用程序的代码中进行很少的改进。