简体   繁体   中英

How to find out the cause of CPU 100% of Node.js server?

I'm running Node.js server with socket.io. It's simple chat server. It's been 2 years so the versions of software are pretty old, so I updated them recently. After updates, the server consumes CPU 100% frequently. It has worked well for 2 years so I don't think the cause is application code, but I cannot find out what the problem is.

Before I updated:

  • Node.js 0.8.14
  • socket.io 0.9.16
  • express 2.5.2

Now I'm using:

  • Node.js 0.10.28 ~ 0.11.13 (tried both)
  • socket.io 1.0.1
  • express 4.1.1

I've tried benchmark but I couldn't reproduce. I've figured out the template rendering is pretty slow, but my chat server is for mobile apps so it doesn't use html page a lot. Only admin page is using template engine, but CPU 100% happens when I didn't see admin pages.

Using strace, I got this:

strace -r -p 32224 -c
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 16.91    0.003417          35        97           futex
 14.47    0.002923           8       347        72 epoll_ctl
 14.10    0.002848          20       144           write
 11.32    0.002286          15       152           read
  6.27    0.001266          18        70           close
  5.77    0.001165          19        61        61 connect
  5.53    0.001117           6       183           clock_gettime
  5.20    0.001051         117         9           munmap
  4.65    0.000940           5       173           gettimeofday
  4.19    0.000846          14        61           socket
  3.72    0.000752           6       122           ioctl
  3.36    0.000679          12        58           epoll_wait
  2.34    0.000473           7        72           getsockopt
  1.95    0.000394          56         7           mmap
  0.22    0.000045          23         2           open
------ ----------- ----------- --------- --------- ----------------
100.00    0.020202                  1558       133 total

However, I don't know how to analyze this report. epoll_ctl seems to be used by event loop, and the errors of epoll_ctl may be caused by the errors of connect, right? I found that syscall connect is for socket connection, but I cannot go further.

This strace report is 2 minutes long. There aren't so many users. Just 2~5 users for that time.

Can I find out the cause using report? or Do I have to find other way to debug?

There is the V8 Profiler that can output a report that can be read in Chrome Profiling tab. If you use PM2 and Keymetrics, it's really easy. Just install v8-profiler and pmx modules. Make sure to require the pmx module in the script and then start profiling via the Keymetrics site. You can always use the V8 Profiler alone to get the same report. It's a little more work though.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM