简体   繁体   English

单线程Node.js如何同时处理请求?

[英]How does single-threaded Node.js handles requests concurrently?

I am currently deeply learning Nodejs platform. 我目前正在深入学习Nodejs平台。 As we know, Nodejs is single-threaded, and if it executes blocking operation (for example fs.readFileSync), a thread should wait to finish that operation. 众所周知,Nodejs是单线程的,如果它执行阻塞操作(例如fs.readFileSync),则线程应等待完成该操作。 I decided to make an experiment: I created a server that responses with the huge amount of data from a file on each request 我决定做一个实验:我创建了一个服务器,该服务器在每次请求时都会从文件中获取大量数据

 const { createServer } = require('http'); const fs = require('fs'); const server = createServer(); server.on('request', (req, res) => { let data; data =fs.readFileSync('./big.file'); res.end(data); }); server.listen(8000); 

Also, I launched 5 terminals in order to do parallel requests to a server. 另外,为了启动对服务器的并行请求,我启动了5个终端。 I waited to see that while one request is being handled, the others should wait for finishing blocking operation from the first request. 我等待看到正在处理一个请求时,其他请求则应等待从第一个请求开始完成阻塞操作。 However, the other 4 requests were responded concurrently. 但是,其他4个请求被同时响应。 Why does this behavior occur? 为什么会发生这种现象?

What you're likely seeing is either some asynchronous part of the implementation inside of res.end() to actually send your large amount of data or you are seeing all the data get sent very quickly and serially, but the clients can't process it fast enough to actually show it serially and because the clients are each in their own separate process, they "appear" to show it arriving concurrently just because they're too slow reacting to show the actually arrival sequence. 您可能看到的是res.end()内部实现的某些异步部分实际发送了大量数据,或者您看到所有数据都非常快速且串行地发送,但是客户端无法处理它的速度足够快,可以实际按顺序显示它,并且由于客户端各自处于各自独立的过程中,因此它们“出现”以显示它是同时到达的,这仅仅是因为它们反应太慢而无法显示实际到达的顺序。

One would have to use a network sniffer to see which of these is actually occurring or run some different tests or put some logging inside the implementation of res.end() or tap into some logging inside the client's TCP stack to determine the actual order of packet arrival among the different requests. 人们将不得不使用网络嗅探器来查看其中哪些实际上在发生,或者运行一些不同的测试,或者在res.end()的实现中放入一些日志记录,或者在客户端的TCP堆栈中利用一些日志记录来确定实际的顺序。不同请求之间的数据包到达。


If you have one server and it has one request handler that is doing synchronous I/O, then you will not get multiple requests processes concurrently. 如果您有一台服务器,并且有一个正在执行同步I / O的请求处理程序,那么您将不会同时获得多个请求进程。 If you believe that is happening, then you will have to document exactly how you measured that or concluded that (so we can help you clear up your misunderstanding) because that is not how node.js works when using blocking, synchronous I/O such as fs.readFileSync() . 如果您认为这种情况正在发生,那么您将必须准确记录如何衡量或得出结论(以便我们可以帮助您消除误解),因为在使用阻塞式同步I / O时,node.js的工作方式并不如此。作为fs.readFileSync()

node.js runs your JS as single threaded and when you use blocking, synchronous I/O, it blocks that one single thread of Javascript. node.js以单线程方式运行JS,并且当您使用阻塞的同步I / O时,它将阻塞Javascript的一个线程。 That's why you should never use synchronous I/O in a server, except perhaps in startup code that only runs once during startup. 这就是为什么永远不要在服务器中使用同步I / O的原因,除非在启动代码中仅在启动期间运行一次。

What is clear is that fs.readFileSync('./big.file') is synchronous so your second request will not get started processing until the first fs.readFileSync() is done. 清楚的是fs.readFileSync('./big.file')是同步的,因此直到第一个fs.readFileSync()完成后,您的第二个请求才会开始处理。 And, calling it on the same file over and over again will be very fast (OS disk caching). 而且,一次又一次地在同一文件上调用它会非常快(OS磁盘缓存)。

But, res.end(data) is non-blocking, asynchronous. 但是, res.end(data)是非阻塞的,异步的。 res is a stream and you're giving the stream some data to process. res是一个流,您正在为流提供一些要处理的数据。 It will send out as much as it can over the socket, but if it gets flow controlled by TCP, it will pause until there's more room to send on the socket. 它会通过套接字发送尽可能多的内容,但是如果它受到TCP的流控制,它将暂停直到在套接字上有更多的发送空间。 How much that happens depends upon all sorts of things about your computer, it's configuration and the network link to the client. 发生多少取决于计算机的各种情况,计算机的配置以及与客户端的网络链接。

So, what could be happening is this sequence of events: 因此,可能会发生以下事件序列:

  1. First request arrives and does fs.readFileSync() and calls res.end(data) . 第一个请求到达并执行fs.readFileSync()并调用res.end(data) That starts sending data to the client, but returns before it is done because of TCP flow control. 这开始将数据发送到客户端,但是由于TCP流量控制而在完成之前返回。 This sends node.js back to its event loop. 这会将node.js发送回其事件循环。

  2. Second request arrives and does fs.readFileSync() and calls res.end(data) . 第二个请求到达并执行fs.readFileSync()并调用res.end(data) That starts sending data to the client, but returns before it is done because of TCP flow control. 这开始将数据发送到客户端,但是由于TCP流量控制而在完成之前返回。 This sends node.js back to its event loop. 这会将node.js发送回其事件循环。

  3. At this point, the event loop might start processing the third or fourth requests or it might service some more events (from inside the implementation of res.end() or the writeStream from the first request to keep sending more data. If it does service those events, it could give the appearance (from the client point of view) of true concurrency of the different requests). 此时,事件循环可能会开始处理第三个或第四个请求,或者它可能会处理更多事件(从res.end()内部实现或第一个请求的writeStream来继续发送更多数据。)这些事件,它可以显示(从客户端的角度来看)不同请求的真正并发性。

Also, the client could be causing it to appear sequenced. 另外,客户端可能导致其显示为已排序。 Each client is reading a different buffered socket and if they are all in different terminals, then they are multi-tasked. 每个客户端都在读取一个不同的缓冲套接字,如果它们都在不同的终端中,那么它们将是多任务的。 So, if there is more data on each client's socket than it can read and display immediately (which is probably the case), then each client will read some, display some, read some more, display some more, etc... If the delay between sending each client's response on your server is smaller than the delay in reading and displaying on the client, then the clients (which are each in their own separate processes) are able to run concurrently. 因此,如果每个客户端套接字上的数据多于其立即读取和显示的数据(可能是这种情况),则每个客户端将读取一些内容,显示一些内容,读取更多内容,显示更多内容,等等。在服务器上发送每个客户端的响应之间的延迟小于在客户端上读取和显示的延迟,这样客户端(每个客户端在各自独立的进程中)便可以并发运行。


When you are using asynchronous I/O such as fs.readFile() , then properly written node.js Javascript code can have many requests "in flight" at the same time. 当使用fs.readFile()类的异步I / O时,正确编写的node.js Javascript代码可以同时“运行中”有许多请求。 They don't actually run concurrently at exactly the same time, but one can run, do some work, launch an asynchronous operation, then give way to let another request run. 它们实际上并不能在完全相同的时间并发运行,但是可以运行,执行一些工作,启动异步操作,然后让路让另一个请求运行。 With properly written asynchronous I/O, there can be an appearance from the outside world of concurrent processing, even though it's more akin to sharing of the single thread whenever a request handler is waiting for an asynchronous I/O request to finish. 使用正确编写的异步I / O,即使在请求处理程序等待异步I / O请求完成时,它看起来更像是共享单个线程,这在外部处理中可能会出现并发处理的外观。 But, the server code you show is not this cooperative, asynchronous I/O. 但是,您显示的服务器代码不是此协作的异步I / O。

Maybe is not related directly to your question but i think this is useful, 也许与您的问题没有直接关系,但是我认为这很有用,

You can use a stream instead of reading the full file into memory, for example: 您可以使用流而不是将整个文件读入内存,例如:

const { createServer } = require('http');
const fs = require('fs');

const server = createServer();

server.on('request', (req, res) => {
   const readStream = fs.createReadStream('./big.file'); // Here we create the stream.
   readStream.pipe(res); // Here we pipe the readable stream to the res writeable stream.
});

server.listen(8000);

The point of doing this is: 这样做的重点是:

  • Looks nicer. 看起来更好
  • You don't store the full file in RAM. 您不会将完整文件存储在RAM中。

This works better because is non blocking, and the res object is already a stream, and this means the data will be transfered in chunks. 这样比较好用,因为它是非阻塞的,并且res对象已经是一个流,并且这意味着数据将按块传输。

Ok so streams = chunked 好吧, streams = chunked

Why not read chunks from the file and send them in real time instead of reading a really big file and divide that in chunks after? 为什么不从文件中读取大块并实时发送它们,而不是读取一个很大的文件然后将其分成大块?

Also why is really important on a real production server? 另外,为什么在真正的生产服务器上真的很重要?

Because every time a request is received, your code is going to add that big file into ram, to that add this is concurrent so you are expecting to serve multiple files at the same time, so let's do the most advanced math my poor education allows: 因为每次收到请求时,您的代码都会将该大文件添加到ram中,并且该添加是并发的,因此您希望同时提供多个文件,因此,让我们做一下我受教育程度不高的数学:

1 request for a 1gb file = 1gb in ram 1个1GB文件的请求= 1GB RAM

2 requests for a 1gb file = 2gb in ram 2个请求1GB文件= 2GB RAM

etc 等等

That clearly doesn't scale nicely right? 这显然不能很好地扩展,对吗?

Streams allows to decouple that data from the current state of the function (inside that scope), so in simple terms its going to be (with the default chunk size of 16kb): 流允许从该函数的当前状态(在该范围内)解耦该数据,因此简单地说,它将是(默认chunk大小为16kb):

1 request for 1gb file = 16kb in ram 1个1gb文件的请求= ram中的16kb

2 requests for 1gb file = 32kb in ram 2个1GB文件的请求= ram中的32kb

etc 等等

And also, the OS its already passing a stream to node (fs) so it works with streams end to end. 而且,操作系统已经将流传递到节点(fs),因此它与流首尾相连。

Hope it helps :D. 希望对您有帮助:D。

PD: Never use sync operations (blocking) inside async operations (non blocking). PD:请勿在异步操作(非阻止)中使用同步操作(阻止)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM