Node.js 架构和性能

Question

I have a question about the architecture and performance of Node js.我有一个关于 Node js 的架构和性能的问题。

I've done a bunch of reading on the topic (including on Stack Overflow), and I still have a couple of questions.我已经阅读了很多关于这个主题的文章（包括 Stack Overflow），但我仍然有几个问题。 I'd like to do 2 things:我想做两件事：

Summarize what I've learned from crawling many different sources semi-concisely to see if my conclusions are correct.半简洁地总结我从爬取许多不同来源中学到的东西，看看我的结论是否正确。
Ask a couple questions about the threading and performance of Node that I haven't been able to pin down exact answers on from my research.问几个关于 Node 的线程和性能的问题，我无法从我的研究中确定确切的答案。

Node has a Single-Threaded, Asynchronous Event-Handling Architecture Node 具有单线程、异步事件处理架构

Single-Threaded - There is a single event thread that dispatches asynchronous work (result typically I/O but can be computation) and performs callback execution (ie handling of async work results).单线程- 有一个单独的事件线程调度异步工作（结果通常是 I/O，但可以是计算）并执行回调执行（即处理异步工作结果）。

The event thread runs in an infinite "event loop" doing the 2 jobs above;事件线程在一个无限的“事件循环”中运行，完成上面的 2 个工作； a) handling requests by dispatching async work, and b) noticing that previous async work results are ready and executing a callback to process the results. a) 通过分派异步工作来处理请求，以及 b) 注意到先前的异步工作结果已准备就绪并执行回调来处理结果。
The common analogy here is of the restaurant order taker: the event thread is a super-fast waiter that takes orders (services requests) from the dining room and delivers the orders to the kitchen to be prepared (dispatches async work), but also notices when food is ready (asynch results) and delivers it back to the table (callback execution).这里常见的类比是餐厅点餐员：事件线程是一个超快的服务员，从餐厅接受订单（服务请求）并将订单交付到厨房准备（调度异步工作），但也通知当食物准备好（异步结果）并将其送回餐桌（回调执行）。
The waiter doesn't cook any food;服务员不做任何食物； his job is to be going back and forth from dining room to kitchen as quickly as possible.他的工作是尽快从餐厅来回厨房。 If he gets bogged down taking an order in the dining room, or if he is forced to go back into the kitchen to prepare one of the meals, the system becomes inefficient and sytem throughput suffers.如果他在餐厅接受订单时陷入困境，或者如果他被迫回到厨房准备其中一顿饭，系统就会变得效率低下，系统吞吐量也会受到影响。

Asynchronous The asynchronous workflow resulting from a request (eg a web request) is logically a chain: eg异步请求（例如 Web 请求）产生的异步工作流在逻辑上是一个链：例如

   FIRST [ASYNC: read a file, figure out what to get from the database] THEN 
   [ASYNC: query the database] THEN 
   [format and return the result].

The work labeled "ASYNC" above is "kitchen work" and the "FIRST []" and "THEN []" represent the involvement of the waiter initiating a callback.上面标有“ASYNC”的工作是“厨房工作”，“FIRST []”和“THEN []”代表服务员参与发起回调。

Chains like this are represented programmatically in 3 common ways:像这样的链以 3 种常见方式以编程方式表示：

nested functions/callbacks嵌套函数/回调
promises chained with .then()用 .then() 链接的 promise
async methods that await() on async results.对异步结果进行 await() 的异步方法。

All these coding approaches are pretty much equivalent, although asynch/await appears to be the cleanest and makes reasoning about asynchronous coding easier.所有这些编码方法几乎是等效的，尽管 asynch/await 似乎是最干净的并且使有关异步编码的推理更容易。

This is my mental picture of what's going on...is it correct?这是我对正在发生的事情的心理想象……它是正确的吗？ Comments very much appreciated!非常感谢评论！

Questions问题

My questions concern the use of OS-supported asynchronous operations, who actually does the asynchronous work, and the ways in which this architecture is more performant than the "spawn a thread per request" (ie multiple cooks) architecture:我的问题涉及操作系统支持的异步操作的使用，谁实际执行异步工作，以及该架构比“每个请求生成一个线程”（即多个厨师）架构的性能更高的方式：

Node libraries have been design to be asynchronous by making use of the cross-platform asynch library libuv, correct?通过使用跨平台异步库 libuv，节点库已被设计为异步的，对吗？ Is the idea here that libuv presents node (on all platforms) with a consistent async I/O interface, but then uses platform-dependent async I/O operations under the hood?这里的想法是 libuv 为节点（在所有平台上）提供一致的异步 I/O 接口，然后在引擎盖下使用依赖于平台的异步 I/O 操作吗？ In the case where the I/O request goes "all the way down" to an OS-supported async operation, who is "doing the work" of waiting for the I/O to return and triggering node?在 I/O 请求“一直向下”到 OS 支持的异步操作的情况下，谁在“做”等待 I/O 返回并触发节点的工作？ Is it the kernel, using a kernel thread?它是内核，使用内核线程吗？ If not, who?如果不是，是谁？ In any case, how many requests can this entity handle?无论如何，这个实体可以处理多少个请求？
I've read that libuv also makes use of a thread pool (typically pthreads, one per core?) internally.我读过 libuv 也在内部使用线程池（通常是 pthreads，每个内核一个？）。 Is this to 'wrap' operations that do not "go all the way down" as async, so that a thread can be used to sit and wait for a synchronous operation, so libuv can present an async API?这是为了将不会“一路向下”的操作“包装”为异步，以便可以使用线程坐下来等待同步操作，从而使 libuv 可以提供异步 API？
With regard to performance, the usual illustration that's given to explain the performance boost a node-like architecture can provide is: picture the (presumably slower and fatter) thread-per-request approach -- there's latency, CPU, and memory overhead to spawning a bunch of threads that are just sitting around waiting on I/O to complete (even if they're not busy-waiting) and then tearing them down, and node largely makes this go away because it uses a long-lived event thread to dispatch asynch I/O to the OS/kernel, right?关于性能，用于解释类似节点的架构可以提供的性能提升的通常说明是：想象（可能更慢和更胖）线程每个请求的方法——产生延迟、CPU 和内存开销一堆线程只是坐在等待 I/O 完成（即使它们不忙于等待）然后将它们拆除，node 在很大程度上使这种情况消失，因为它使用了一个长期存在的事件线程来将异步 I/O 分派到操作系统/内核，对吗？ But at the end of the day, SOMETHING is sleeping on a mutex and getting woken up when the I/O is ready...is the idea that if it's the kernel that's way more efficient than if it's a userland thread?但是在一天结束时，某些东西在互斥锁上睡觉并在 I/O 准备好时被唤醒……是不是内核比用户线程更有效率？ And finally, what about the case where the request is handled by libuv's thread pool...this seems similar to the thread-per-request approach except for the efficiency of using the pool (avoiding the bring-up and tear-down), but in this case what happens when there's many requests and the pool has a backlog?...latency increases and now you're doing worse than the thread-per-request, right?最后，请求由libuv的线程池处理的情况如何……除了使用池的效率（避免启动和拆卸）之外，这似乎类似于每个请求的线程方法，但在这种情况下，当有很多请求并且池有积压时会发生什么？......延迟增加，现在你的表现比每个请求的线程差，对吧？

Answer 1

There are good answers here on SO that can give you a clearer picture of the architecture.这里有关于 SO 的很好的答案，可以让您更清晰地了解架构。 However, you have some specific questions that can be answered.但是，您有一些可以回答的具体问题。

Who is "doing the work" of waiting for the I/O to return and triggering node?谁在“做”等待 I/O 返回并触发节点的工作？ Is it the kernel, using a kernel thread?它是内核，使用内核线程吗？ If not, who?如果不是，是谁？ In any case, how many requests can this entity handle?无论如何，这个实体可以处理多少个请求？

Actually, both threads and async I/O are implemented on top of the same primitive: OS event queue.实际上，线程和异步 I/O 都是在同一个原语之上实现的：操作系统事件队列。

Multitasking OSes were invented in order to allow users to execute multiple programs in parallel using a single CPU core.多任务操作系统的发明是为了允许用户使用单个 CPU 内核并行执行多个程序。 Yes, multi-core, multi-thread systems did exist back then but they were big (usually the size of two or three average bedrooms) and expensive (usually the cost of one or two average houses).是的，当时确实存在多核、多线程系统，但它们很大（通常有两到三间普通卧室的大小）且价格昂贵（通常是一两间普通房屋的成本）。 These systems can execute multiple operations in parallel without the help of an OS.这些系统可以在没有操作系统帮助的情况下并行执行多个操作。 All you need is a simple loader program (called an executive, a primitive DOS-like OS) and you can create threads in assembly directly without the help of an OS.您所需要的只是一个简单的加载程序（称为执行程序，一种原始的类似 DOS 的操作系统），您可以在没有操作系统帮助的情况下直接在程序集中创建线程。

Cheaper, more mass-produced computers can only run one thing at a time.更便宜、更大规模生产的计算机一次只能运行一件事。 For a long time this was acceptable to users.长期以来，这是用户可以接受的。 However, people who got used to time-sharing systems wanted more from their computers.然而，习惯了分时系统的人希望从他们的计算机中获得更多。 Thus processes and threads were invented.因此发明了进程和线程。

But at the OS level there are no threads.但是在操作系统级别没有线程。 The OS itself provide the threading service (well.. technically you CAN implement threading as a library without needing OS support).操作系统本身提供线程服务（嗯……从技术上讲，您可以将线程作为库来实现，而无需操作系统支持）。 So how does OS implement threads?那么操作系统是如何实现线程的呢？

Interrupts.中断。 It's the core of all asynchronous processing.它是所有异步处理的核心。

A process or thread is simply an event waiting to be processed by the CPU and managed by the OS.进程或线程只是一个等待 CPU 处理并由操作系统管理的事件。 This is possible because the CPU hardware supports interrupts.这是可能的，因为 CPU 硬件支持中断。 Any thread or process waiting for an I/O event (from mouse, disk, network etc.) is stopped, suspended and added to the event queue and other processes or threads are executed during the wait time.任何等待 I/O 事件（来自鼠标、磁盘、网络等）的线程或进程都会被停止、暂停并添加到事件队列中，并且在等待时间内执行其他进程或线程。 There is also a timer built into the CPU that can trigger an interrupt (amazingly enough the interrupt is called a timer interrupt). CPU 中还内置了一个可以触发中断的定时器（令人惊讶的是，这种中断被称为定时器中断）。 This timer interrupt triggers the OS's process/thread management system so that you can still run multiple processes in parallel even if none of them are waiting for I/O events.这个定时器中断会触发操作系统的进程/线程管理系统，这样即使没有一个进程在等待 I/O 事件，您仍然可以并行运行多个进程。

This is the core of multitasking.这是多任务处理的核心。 This kind of programming (using timers and interrupts) are normally not taught except in operating system design, embedded programming (where you often need to do OS-like things without an OS) and real-time programming.除了操作系统设计、嵌入式编程（你经常需要在没有操作系统的情况下做类似操作系统的事情）和实时编程之外，通常不会教授这种编程（使用定时器和中断）。

So, what's the difference between async I/O and processes?那么，异步 I/O 和进程之间有什么区别？

They're exactly the same thing except for the API the OS exposes to the programmer:除了操作系统向程序员公开的 API 之外，它们完全相同：

Process / threads : Hey there programmer, pretend you are writing a simple program for a single CPU and pretend you have full control of the CPU.进程/线程：嘿，程序员，假设您正在为单个 CPU 编写一个简单的程序，并假设您可以完全控制 CPU。 Go ahead, use my I/O.来吧，使用我的 I/O。 I'll maintain the illusion of you controlling the CPU while I handle the mess of running things in parallel.当我处理并行运行的混乱时，我会保持你控制 CPU 的错觉。
Asynchronous I/O : You think you know better than me?异步I/O ：你认为你比我更了解？ OK, I let you add event listeners directly to my internal queue.好的，我让您直接将事件侦听器添加到我的内部队列中。 But I'm not going to handle which function gets called when the event happens.但我不打算处理事件发生时调用哪个函数。 I'm just going to rudely wake your process and you handle all that yourself.我只是粗鲁地唤醒你的过程，你自己处理所有这些。

In the modern world of multi-core CPUs the OS still does this kind of process management because a typical modern OS runs dozens of processes while the PC typically only have two or four cores.在多核 CPU 的现代世界中，操作系统仍然执行这种进程管理，因为典型的现代操作系统运行数十个进程，而 PC 通常只有两个或四个内核。 With multi-core machines there's another difference:多核机器还有另一个区别：

Process / threads : Since I'm handling the process queue for you I guess you won't mind if I spread the load of the threads you ask me to run across several CPUs will you?进程/线程：由于我正在为您处理进程队列，我想您不会介意我分散您要求我在多个 CPU 上运行的线程的负载吧？ This way I'll let the hardware do the work in parallel.这样我会让硬件并行完成工作。
Async I/O : Sorry, I can't spread all the different wait callbacks across different CPUs because I have no idea what the hell your code is doing.异步 I/O ：抱歉，我无法将所有不同的等待回调分布在不同的 CPU 上，因为我不知道你的代码到底在做什么。 Single core for you!单核给你！

I've read that libuv also makes use of a thread pool (typically pthreads, one per core?) internally.我读过 libuv 也在内部使用线程池（通常是 pthreads，每个内核一个？）。 Is this to 'wrap' operations that do not "go all the way down" as async这是为了将不会“一直向下”的操作“包装”为异步吗？

Yes.是的。

Actually to the best of my knowledge all OSes provide good enough async I/O interface that you don't need thread pools.实际上，据我所知，所有操作系统都提供了足够好的异步 I/O 接口，您不需要线程池。 The programming language Tcl have been handling async I/O like node without the help of thread pools since the 80s.自 80 年代以来，编程语言Tcl一直在处理异步 I/O 之类的节点，而无需线程池的帮助。 But it's very messy and not so simple.但它非常凌乱，并不那么简单。 Node developers decided they didn't want to handle this mess when it comes to disk I/O and simply use the more well-tested blocking file API with threads. Node 开发人员决定，当涉及到磁盘 I/O 时，他们不想处理这种混乱，而只是将经过充分测试的阻塞文件 API 与线程一起使用。

But at the end of the day, SOMETHING is sleeping on a mutex and getting woken up when the I/O is ready但是在一天结束时，某些东西正在互斥锁上睡觉并在 I/O 准备好时被唤醒

I hope my answer to (1) also answers this question.我希望我对（1）的回答也能回答这个问题。 But if you want to know what that something is I suggest you read about the select() function in C. If you know C programming I suggest you try write a TCP/IP program without threads using select() .但是，如果您想知道那是什么，我建议您阅读 C 中的select()函数。如果您了解 C 编程，我建议您尝试使用select()编写没有线程的 TCP/IP 程序。 Google "select c".谷歌“选择c”。 I have a much more detailed explanation of how this all works at the C level in another answer: I know that callback function runs asynchronously, but why?我在另一个答案中更详细地解释了这一切是如何在 C 级别工作的：我知道回调函数异步运行，但为什么呢？

What happens when there's many requests and the pool has a backlog?...latency increases and now you're doing worse than the thread-per-request, right?当有很多请求并且池有积压时会发生什么？......延迟增加，现在你的表现比线程每个请求更糟糕，对吧？

I hope once you understand my answer to (1) you will also realize that there's no escape from the backlog even if you use threads.我希望一旦你理解了我对 (1) 的回答，你也会意识到即使你使用线程也无法摆脱积压。 The hardware does not really have support for OS-level threads.硬件并不真正支持操作系统级线程。 Hardware threads are limited to the number of cores so at the hardware level the CPU is a thread pool.硬件线程受限于内核数量，因此在硬件级别，CPU 是一个线程池。 The difference between single-threaded and multi-threaded is simply multi-threaded programs can really execute several threads in parallel in hardware while single-threaded programs can use only a single CPU.单线程和多线程的区别很简单，多线程程序真正可以在硬件中并行执行多个线程，而单线程程序只能使用单个CPU。

The only REAL difference between async I/O and traditional multi-threaded programs is the thread creation latency.异步 I/O 和传统多线程程序之间唯一真正的区别是线程创建延迟。 In this sense, there's no advantage programs like node.js has over programs that use thread pools like nginx and apache2.从这个意义上说，像 node.js 这样的程序比使用像 nginx 和 apache2 这样的线程池的程序没有任何优势。

However, due to the way CGI works programs like node.js will still have higher throughput because you don't have to re-initialize the interpreter and all the objects in your program for each request.但是，由于 CGI 的工作方式，像 node.js 这样的程序仍然具有更高的吞吐量，因为您不必为每个请求重新初始化解释器和程序中的所有对象。 This is why most languages have moved to web frameworks that runs as a HTTP service (like node's Express.js) or something like FastCGI.这就是为什么大多数语言已经转移到作为 HTTP 服务（如 node 的 Express.js）或 FastCGI 之类的东西运行的 Web 框架。

Note: Do you really want to know what's the big deal about thread creation latency?注意：你真的想知道线程创建延迟有什么大不了的吗？ In the late 90s/ early 2000s there was a web server benchmark.在 90 年代末/ 2000 年代初，有一个 Web 服务器基准测试。 Tcl, a language notoriously 500% slower than C on average (because it's based on string processing like bash) managed to outperform apache (this was before apache2 and triggered the complete re-architecture that created apache2). Tcl 是一种众所周知的平均比 C 慢 500% 的语言（因为它基于像 bash 这样的字符串处理）设法胜过 apache（这是在 apache2 之前并触发了创建 apache2 的完整重新架构）。 The reason is simple: tcl had good async I/O api so programmers are more likely to use async I/O.原因很简单：tcl 有很好的异步 I/O api，所以程序员更有可能使用异步 I/O。 This alone beat a program written in C (not that C does not have async I/O, tcl was written in C after all).仅此一项就击败了用 C 编写的程序（不是说 C 没有异步 I/O，毕竟 tcl 是用 C 编写的）。

The core advantage of node.js over languages like Java is not that it has async I/O. node.js 相对于 Java 等语言的核心优势不在于它具有异步 I/O。 It's that async I/O is pervasive and the API (callbacks, promises) is easy to use so that you can write an entire program using async I/O without needing to drop down to assembly or C.正是异步 I/O 无处不在，而且 API（回调、承诺）易于使用，因此您可以使用异步 I/O 编写整个程序，而无需下拉到汇编或 C。

If you think callbacks are hard to use I strongly suggest you try writing that select() based program in C.如果您认为回调很难使用，我强烈建议您尝试用 C 编写基于select()的程序。

Node.js 架构和性能

问题描述

This is my mental picture of what's going on...is it correct?这是我对正在发生的事情的心理想象……它是正确的吗？ Comments very much appreciated!非常感谢评论！

1 个解决方案

解决方案1
4 已采纳 2018-03-05 00:43:30

Node.js 架构和性能

问题描述

This is my mental picture of what's going on...is it correct?这是我对正在发生的事情的心理想象……它是正确的吗？ Comments very much appreciated!非常感谢评论！

1 个解决方案

解决方案1 4 已采纳 2018-03-05 00:43:30

解决方案1
4 已采纳 2018-03-05 00:43:30