简体繁体 English

性能测试期间的NodeJS空闲操作

[英]NodeJS idle operation during performance tests

原文 2016-08-25 15:43:31 4 1 javascript/ node.js/ google-chrome/ jmeter/ performance-testing

Relative newbie to nodejs here trying to figure out a performance issue in a newly built application. 这里的nodejs的相对新手试图找出新建应用程序中的性能问题。

I am running a performance test on my node 0.12.7 app and I find the server hanging intermittently. 我在节点0.12.7应用程序上运行性能测试，发现服务器间歇性挂起。 It needs a restart upon reaching that state. 达到该状态后需要重新启动。 After confirming there is no memory leak (the process heap does not exceed 500 MB whereas the default max-heap size is 1.4GB I understand), we moved to checking CPU profile. 确认没有内存泄漏（进程堆不超过500 MB，而我理解的默认最大堆大小为1.4GB）后，我们开始检查CPU配置文件。 I have used this snippet of code with a dependency on v8-profiler to get regular profiles 我使用了依赖于v8-profiler的这段代码来获取常规配置文件

Here is one of the charts that we encountered from jmeter (although the server didn't hang) 这是我们从jmeter遇到的图表之一（尽管服务器没有挂起）

We plotted flame graphs in Chrome by loading the CPU profiles. 我们通过加载CPU配置文件在Chrome中绘制了火焰图。 I was expecting to find the JS stuck somewhere at this point, but I find that exactly in that time range, the node server is idle for a long time. 我原本希望在此时发现JS卡住了某个地方，但是我发现恰好在该时间范围内，节点服务器很长时间都处于空闲状态。 Could anyone help me understand what could be the probable causes for the server to stay idle while being bombarded with client requests, and eventually recovering to continue operations after 10 minutes? 有谁能帮助我了解在受到客户端请求轰炸时服务器保持空闲状态并最终在10分钟后恢复以继续运行的可能原因是什么？

I unfortunately have lost the data to check if the responses between 16:48:10 and 16:57:40 are error or success, but it is very likely that they are error responses from the proxy since node didn't have a care in the world. 不幸的是，我丢失了数据以检查16:48:10和16:57:40之间的响应是错误还是成功，但是很可能它们是来自代理的错误响应，因为节点不在意世界。

Here are the flame charts seen in Chrome 这是在Chrome中看到的火焰图

Before 16.47 hrs, 在16.47小时之前，
Around 16.47 hrs 约16.47小时
A couple of minutes after 16.47 hrs 16.47小时后的几分钟

1 个解决方案

There could be multiple reasons around this. 这可能有多种原因。

Server is not accepting the requests. 服务器不接受请求。 Do you see drop in throughput after you reach the peak? 达到峰值后，您看到吞吐量下降了吗？
Have you checked the server logs to see if any exceptions are logged? 您是否检查服务器日志以查看是否记录了任何异常？
Try plotting trends of response time and throughput for your test duration. 尝试绘制测试期间响应时间和吞吐量的趋势图。
You may want to see any IO bound operations in your code. 您可能希望在代码中看到任何与IO绑定的操作。
Check the processor queue length. 检查处理器队列长度。 You should see it building up if processes are not getting enough CPU. 如果进程没有获得足够的CPU，您应该看到它正在建立。