[英]Does Node.js or express itself block event loop?

I have very simple Node.js 8.5.0 express 4.15.5 server with cluster module, serving static files. 我有一个非常简单的Node.js 8.5.0 Express 4.15.5服务器,带有群集模块,可提供静态文件。 The problem is that it seems that event loop is blocked at times for too long time. 问题在于,似乎事件循环有时会阻塞太长时间。 I'm using blocked module and also keep track how many requests have been handled since last check, using timeout interval 70ms. 我正在使用阻止的模块,并且还使用超时间隔70ms跟踪自上次检查以来已处理了多少个请求。 Many times the counter is just zero: event loop is blocked sometimes for second(s) while there were no requests. 很多时候计数器只是零:在没有请求的情况下,事件循环有时会阻塞一秒钟。

Log: 日志:

Execution blocked for 1056 ms [2017-09-27 16:18:06.322], 1 requests, total requestcount 115, pid 31071
Execution blocked for 358 ms [2017-09-27 16:18:12.570], 0 requests, total requestcount 123, pid 31071
Execution blocked for 1578 ms [2017-09-27 16:18:15.551], 10 requests, total requestcount 147, pid 31071
Execution blocked for 872 ms [2017-09-27 16:18:35.926], 0 requests, total requestcount 557, pid 31077
Execution blocked for 117 ms [2017-09-27 16:20:11.668], 0 requests, total requestcount 761, pid 31077
Execution blocked for 381 ms [2017-09-27 16:23:00.268], 0 requests, total requestcount 2231, pid 31077
Execution blocked for 1206 ms [2017-09-27 16:23:06.096], 2 requests, total requestcount 3147, pid 31070
Execution blocked for 505 ms [2017-09-27 16:23:10.319], 2 requests, total requestcount 2256, pid 31077
Execution blocked for 475 ms [2017-09-27 16:23:10.335], 1 requests, total requestcount 840, pid 31071
Execution blocked for 2113 ms [2017-09-27 16:23:16.918], 1 requests, total requestcount 2283, pid 31077
Execution blocked for 303 ms [2017-09-27 16:23:20.071], 0 requests, total requestcount 3261, pid 31070
Execution blocked for 423 ms [2017-09-27 16:23:23.417], 1 requests, total requestcount 3267, pid 31070
Execution blocked for 6395 ms [2017-09-27 16:23:31.633], 7 requests, total requestcount 3285, pid 31070
Execution blocked for 210 ms [2017-09-27 16:32:04.764], 10 requests, total requestcount 3306, pid 31071
Execution blocked for 690 ms [2017-09-27 16:32:05.945], 1 requests, total requestcount 3313, pid 31071
Execution blocked for 704 ms [2017-09-27 16:32:05.948], 5 requests, total requestcount 5214, pid 31077
Execution blocked for 857 ms [2017-09-27 16:32:07.082], 0 requests, total requestcount 3315, pid 31071
Execution blocked for 1475 ms [2017-09-27 16:32:12.691], 0 requests, total requestcount 3333, pid 31071
Execution blocked for 1487 ms [2017-09-27 16:32:12.692], 1 requests, total requestcount 5247, pid 31077
Execution blocked for 125 ms [2017-09-27 16:32:16.306], 0 requests, total requestcount 7921, pid 31070
Execution blocked for 189 ms [2017-09-27 16:33:16.369], 0 requests, total requestcount 8087, pid 31070
Execution blocked for 182 ms [2017-09-27 16:33:16.621], 0 requests, total requestcount 8087, pid 31070

strace example: strace示例:

epoll_wait(6, [], 1024, 70)             = 0
epoll_wait(6, [], 1024, 70)             = 0
epoll_wait(6, [], 1024, 70)             = 0
write(2, "Execution blocked for 724 ms [20"..., 103) = 103
epoll_wait(6, [{EPOLLIN, {u32=24, u64=24}}], 1024, 70) = 1
read(24, "", 1024)                      = 0
epoll_ctl(6, EPOLL_CTL_DEL, 24, 0x7fff8ef58de0) = 0
close(24)                               = 0
epoll_wait(6, [], 1024, 0)              = 0
epoll_wait(6, [], 1024, 69)             = 0
epoll_wait(6, [], 1024, 70)             = 0

Also there is plenty of memory and CPU available (3 core): 另外,还有足够的内存和CPU可用(3核):

top - 16:36:50 up 6 days,  5:51,  4 users,  load average: 0.17, 0.37, 0.45
Tasks: 137 total,   1 running, 136 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.3 us,  0.6 sy,  0.0 ni, 91.0 id,  0.1 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  4562340 total,   170144 free,  2234000 used,  2158196 buff/cache
KiB Swap:  1048572 total,   993992 free,    54580 used.  2075596 avail Mem

I also set up GC monitoring, but following 100ms is rarely reached: 我还设置了GC监视,但很少达到100ms:

const obs = new PerformanceObserver((list) => {
    let gc = list.getEntries()[0];
    if (gc.duration > 100) {
        console.warn('gc', gc);                 
obs.observe({ entryTypes: ['gc'] });

Does express or some of modules it uses cause the blocking even there is seemingly nothing happening? Express或它使用的某些模块是否造成了阻塞,即使似乎什么也没有发生? How to debug that? 如何调试呢? If not, is it Node.js itself? 如果不是,那是Node.js本身吗? If not, what? 如果没有,那怎么办? As not blocking the event loop is basics of Node.js I'd presume there are tools to debug this out but couldn't find any. 因为不阻塞事件循环是Node.js的基础,所以我认为有一些工具可以对此进行调试,但找不到任何工具。

Edit: Tested with both spdy and native https module, no difference. 编辑:使用spdy和本机https模块测试,没有区别。

Edit: Source code: 编辑:源代码:

"use strict";

const   bodyParser = require('body-parser'),
        cluster = require('cluster'),
        cors = require('cors'),
        compress = require('compression'),
        cookieParser = require('cookie-parser'),
        express = require('express'),
        favicon = require('serve-favicon'),
        fs = require('fs'),
        http = require('http'),
//        https = require('spdy'),
        https = require('https'),
        path = require('path'),
        strftime = require('strftime');

const {
} = require('perf_hooks');

global.V = {};

const workers = process.argv[3] || 3;

function blocked(interval, cb, cb_ok) {
        var start = process.hrtime();

                let delta = process.hrtime(start);
                let nanosec = delta[0] * 1e9 + delta[1];
                let ms = nanosec / 1e6;
                let n = ms - interval;
                if (n > interval) {
                else if (cb_ok) {
                start = process.hrtime();
                V.httpRequests2 = 0;
        }, interval).unref();

if (cluster.isMaster) {
        console.log(`Master ${process.pid} is running`);

        // Fork workers.
        for (let i = 0; i < workers; i++) {

        cluster.on('exit', (worker, code, signal) => {
                console.log(`worker ${worker.process.pid} died`);
                setTimeout(function() {
                        console.log('Fork one replacement worker...');
                }, 120000);
else {
    V.expressOptions = {
           key: fs.readFileSync('./ssl/server.key'),
           cert: fs.readFileSync('./ssl/ssl-blunde.crt'),
           requestCert: false,
           rejectUnauthorized: false

    V.expressApp = express();

    V.server_ssl = https.createServer(V.expressOptions, V.expressApp);
    V.expressApp.use(cors({origin: 'https://example.com'}));
    V.expressApp.use(favicon(__dirname + '/static/html/favicon.ico'));
    V.expressApp.use(bodyParser.urlencoded({ extended: true }));

    V.httpRequests = 0;
    V.httpRequests2 = 0;
    V.expressApp.use('*', function(req, res, next) {
    V.expressApp.use('/', express.static(path.join(__dirname, 'static/html')));

    V.expressApp.use(express.static(path.join(__dirname, 'static'), {
         maxAge: 1000 * 60 * 60
    V.expressApp.use(function (err, req, res, next) {
    V.expressApp.use(function (err, req, res, next) {
           if (req.xhr) {
                   console.log('Express error', err);
                   res.status(500).send({ error: 'Something blew up!' });
           else {
    V.expressApp.use(function (err, req, res, next) {
           console.log('Express error 500', err);

    console.log(`Worker ${process.pid} started`);
    blocked(70, function(ms) {
           if (ms > 2500) {
                   console.error('Execution blocked for ' + ms + ' ms [' + strftime('%F %T.%L') + '], %s requests, total requestcount %s, pid %s', V.httpRequests2, V.httpRequests, process.pid);
           else if (ms > 500) {
                   console.warn('Execution blocked for ' + ms + ' ms [' + strftime('%F %T.%L') + '], %s requests, total requestcount %s, pid %s', V.httpRequests2, V.httpRequests, process.pid);
                   if (V.httpRequests > 200000) {
                           console.log('Enough requests, exit, requestcount %s, pid %s', V.httpRequests, process.pid);
           else {
                console.log('Execution blocked for ' + ms + ' ms [' + strftime('%F %T.%L') + '], %s requests, total requestcount %s, pid %s', V.httpRequests2, V.httpRequests, process.pid);
   const obs = new PerformanceObserver((list) => {
           let gc = list.getEntries()[0];
           if (gc.duration > 500) {
                   console.warn('GC', gc);
           else if (gc.duration > 100) {
                   console.log('GC', gc.duration);
   obs.observe({ entryTypes: ['gc'] });     

Edit: It seems that it is related how Node.js communicates with it's threads: Following futex-EAGAIN behaviour happens every time when event loop is blocked. 编辑:似乎与Node.js与线程的通信方式有关:每次阻止事件循环时,都会发生以下futex-EAGAIN行为。 So clearly Node.js is waiting something which practically blocks event loop. 所以很明显的Node.js正在等待一些东西 ,实际上是块事件循环。 The problem is not any I/O as there are no blocking at any of the threads. 问题不在于任何I / O,因为在任何线程中都没有阻塞。

782050 16:14:56.945451111 5 node (17387) < futex res=0
782051 16:14:56.945493832 3 node (17385) > futex addr=7F8F03C8FB20 op=128(FUTEX_PRIVATE_FLAG) val=2
782052 16:14:56.945494164 5 node (17387) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1
782053 16:14:56.945494233 3 node (17385) < futex res=-11(EAGAIN)
782054 16:14:56.945494712 3 node (17385) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1
782055 16:14:56.945494814 5 node (17387) < futex res=0
782056 16:14:56.945494872 3 node (17385) < futex res=0
782057 16:14:56.945495204 3 node (17385) > futex addr=7F8F03C8FB20 op=128(FUTEX_PRIVATE_FLAG) val=2
782058 16:14:56.945495491 5 node (17387) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1
782059 16:14:56.945495541 3 node (17385) < futex res=-11(EAGAIN)
782060 16:14:56.945495941 5 node (17387) < futex res=0
782061 16:14:56.945495992 3 node (17385) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1
782062 16:14:56.945496239 3 node (17385) < futex res=0
782063 16:14:56.945496460 3 node (17385) > futex addr=7F8F03C8FB20 op=128(FUTEX_PRIVATE_FLAG) val=2
782064 16:14:56.945496661 5 node (17387) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1
782065 16:14:56.945496780 3 node (17385) < futex res=-11(EAGAIN)
782066 16:14:56.945497107 5 node (17387) < futex res=0
782067 16:14:56.945497232 3 node (17385) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1
782068 16:14:56.945497381 3 node (17385) < futex res=0
782069 16:14:56.945497596 3 node (17385) > futex addr=7F8F03C8FB20 op=128(FUTEX_PRIVATE_FLAG) val=2
782070 16:14:56.945497764 5 node (17387) > futex addr=7F8F03C8FB20 op=129(FUTEX_PRIVATE_FLAG|FUTEX_WAKE) val=1
782071 16:14:56.945497913 3 node (17385) < futex res=-11(EAGAIN)
782072 16:14:56.945498204 5 node (17387) < futex res=0

node.js or express should not be blocking the event loop when there is nothing to do. 无事可做时,node.js或express不应阻塞事件循环。 There may occasionally be a bit of time used for garbage collection, but I would not expect that to take as long as the 6395ms you observed. 有时可能会花费一些时间来进行垃圾收集,但是我不希望花费那么多的时间来观察到6395ms

The built-in means of serving static files uses only asynchronous I/O so that should not be blocking the event loop either. 提供静态文件的内置方法仅使用异步I / O,因此也不应阻塞事件循环。

If you want further help diagnosing what else in your app might be causing this, then you will likely have to show us your code. 如果您需要进一步的帮助来诊断您的应用程序中还有其他原因,那么您可能必须向我们展示您的代码。

In answer to your direct question: 在回答您的直接问题时:

Does Node.js or express itself block event loop? Node.js或表示自己会阻止事件循环吗?

No, other than very short periods of time for garbage collection. 不,除了很短的垃圾收集时间。 If your server is using very large amounts of Javascript objects and is very, very busy it is occasionally possible for garbage collection to get behind and take a bit of time to catch up, but that would only be in a very, very busy server with code that was using lots of objects (thus creating lots of GC work). 如果您的服务器使用了大量的Javascript对象,并且非常繁忙,则偶尔可能会导致垃圾回收落后并花费一些时间来追赶,但这只能在一个非常繁忙的服务器中进行,使用大量对象的代码(因此创建了大量GC工作)。

FYI, if all you are doing is serving static files in a high load environment, then there are more efficient ways of serving static files than using Express. 仅供参考,如果您要做的只是在高负载环境中提供静态文件,那么提供静态文件的方法比使用Express更为有效。 One common method is by putting Nginx in front of the express server and using Nginx to serve the static files directly from the file system. 一种常见的方法是将Nginx放在快速服务器的前面,并使用Nginx直接从文件系统提供静态文件。 There are also CDNs for larger scale situations. 也有适用于大规模情况的CDN。

For further help, please show us your actual Express code so we can see what your server is doing. 要获得更多帮助,请向我们显示您的实际Express代码,以便我们查看您的服务器在做什么。

To answer my own question: there can be a scenario where all worker threads are busy . 要回答我自己的问题:可能存在所有工作线程都处于繁忙状态的情况 This wasn't my case, however. 但是,这不是我的情况。

There is something different how Node.js 8 operates with worker threads compared to previous versions. 与以前的版本相比,Node.js 8在工作线程上的操作方式有所不同。 Downgrading to Node 7.10.1 completely solved the problem. 降级到Node 7.10.1可以完全解决该问题。 As the problem exists also in simple Express server, I'd conclude it is a bug in Node 8. 由于该问题也存在于简单的Express服务器中,因此我认为这是Node 8中的错误。

