简体   繁体   English

Node.js事件循环对我没有意义

[英]Node.js event loop not making sense to me

I'm new to Node.js. 我是Node.js的新手。 I've been working my way through "Node.js the Right Way" by Jim R. Wilson and I'm running into a contradiction in the book (and in Node.js itself?) that I haven't been able to reconcile to my satisfaction with any amount of googling. 我一直在努力通过Jim R. Wilson的“Node.js正确的方式”,我在书中遇到了一个矛盾(在Node.js本身?)我无法调和对任何数量的谷歌搜索感到满意。

It's stated repetitively in the book and in other resources I have looked at online that Node.js runs callbacks in response to some event line-by-line until completion, then the event loop proceeds with waiting or invoking the next callback. 在书中和我在网上看到的其他资源中重复说明Node.js逐行响应某些事件运行回调直到完成,然后事件循环继续等待或调用下一个回调。 And because Node.js is single-threaded (and short of explicitly doing anything with the cluster module, also runs as a single process), my understanding is that there is only ever, at most, one chunk of JavaScript code executing at a time. 并且因为Node.js是单线程的(并且没有明确地对集群模块做任何事情,也作为单个进程运行),我的理解是,最多只有一块JavaScript代码一次执行。

Am I understanding that correctly? 我理解正确吗? Here's the contradiction (in my mind). 这是矛盾(在我看来)。 How is Node.js so highly concurrent if this is the case? 如果是这种情况,Node.js如何高度并发?

Here is an example straight from the book that illustrates my confusion. 这是一本直​​接来自本书的例子,说明了我的困惑。 It is intended to walk a directory of many thousands of XML files and extract the relevant bits of each into a JSON document. 它旨在遍历数千个XML文件的目录,并将每个文件的相关位提取到JSON文档中。

First the parser: 首先是解析器:

'use strict';
const
  fs = require('fs'),
  cheerio = require('cheerio');

module.exports = function(filename, callback) {
  fs.readFile(filename, function(err, data){
    if (err) { return callback(err); }
    let
      $ = cheerio.load(data.toString()),
      collect = function(index, elem) {
        return $(elem).text();
      };

    callback(null, {
      _id: $('pgterms\\:ebook').attr('rdf:about').replace('ebooks/', ''), 
      title: $('dcterms\\:title').text(), 
      authors: $('pgterms\\:agent pgterms\\:name').map(collect), 
      subjects: $('[rdf\\:resource$="/LCSH"] ~ rdf\\:value').map(collect) 
    });
  });
};

And the bit that walks the directory structure: 走遍目录结构的位:

'use strict';
const

  file = require('file'),
  rdfParser = require('./lib/rdf-parser.js');

console.log('beginning directory walk');

file.walk(__dirname + '/cache', function(err, dirPath, dirs, files){
  files.forEach(function(path){
    rdfParser(path, function(err, doc) {
      if (err) {
        throw err;
      } else {
        console.log(doc);
      }
    });
  });
});

If you run this code, you will get an error resulting from the fact that the program exhausts all available file descriptors. 如果运行此代码,则会因程序耗尽所有可用文件描述符而导致错误。 This would seem to indicate that the program has opened thousands of files concurrently. 这似乎表明该程序同时打开了数千个文件。

My question is... how can this possibly be, unless the event model and/or concurrency model behave differently than how they have been explained? 我的问题是......除非事件模型和/或并发模型的行为与它们的解释方式不同,否则它怎么可能呢?

I'm sure someone out there knows this and can shed light on it, but for the moment, color me very confused! 我相信那里有人知道这一点并且可以解释它,但是目前,让我很困惑!

Am I understanding that correctly? 我理解正确吗?

Yes. 是。

How is Node.js so highly concurrent if this is the case? 如果是这种情况,Node.js如何高度并发?

Not the javascript execution itself is concurrent - the IO (and other heavy tasks) is. 不是javascript执行本身是并发的 - IO(和其他繁重的任务)是。 When you call an asynchronous function, it will start the task (for example, reading a file) and return immediately to "run the next line of the script" as you put it. 当您调用异步函数时,它将启动任务(例如,读取文件)并立即返回“运行脚本的下一行”。 The task however will continue in the background (concurrently) to read the file, and once it's finished it will put the callback that has been assigned to it onto the event loop queue which will call it with the then available data. 然而,任务将在后台(同时)继续读取文件,一旦完成,它将把已分配给它的回调放到事件循环队列中,该队列将使用当时可用的数据调用它。

For details on this "in the background" processing, and how node actually manages to run all these asynchronous tasks in parallel, have a look at the question Nodejs Event Loop . 有关此“在后台”处理的详细信息,以及节点如何实际管理并行运行所有这些异步任务,请查看Nodejs事件循环的问题。

This is a pretty simple description, and skips a lot of things. 这是一个非常简单的描述,并且跳过很多东西。

files.forEach is not asynchnous. files.forEach不是asynchnous。 Therefore the code goes through the list of files in the directory, calling fs.readFile on each one, then returns to the event loop. 因此,代码遍历目录中的文件列表,在每个文件上调用fs.readFile ,然后返回到事件循环。

The loop then has a load of file open events to process, which will then queue up file read events. 然后循环有一大堆要处理的文件打开事件,然后排队文件读取事件。 Then the loop can start going through and calling the callbacks to fs.readFile with the data that's been read. 然后循环可以开始通过并使用已读取的数据调用fs.readFile的回调。 These can only be called one at a time: as you say there's only one thread executing javascript at any one time. 这些只能一次调用一个:正如你所说,任何时候只有一个线程执行javascript。

However, before any of these callbacks are called, you've already opened every file in that original list, leading to file handle exhaustion if there were too many. 但是,在调用任何这些回调之前,您已经打开了该原始列表中的每个文件,如果存在太多,则会导致文件句柄耗尽。

I think OrangeDog's answer is the correct answer to your specific question. 我认为OrangeDog的答案是对您具体问题的正确答案。 But maybe you'll find this short and awesome presentation by Philip Roberts helpful, which explains the concept of the Event Loop and the asynchronous processing of JavaScript really nicely. 但也许你会发现Philip Roberts的这篇简短而精彩的演示文稿很有帮助,它解释了Event Loop的概念和JavaScript的异步处理。 Note that the video is not node.js related, because these principles apply to all JavaScript code. 请注意,视频与node.js不相关,因为这些原则适用于所有JavaScript代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM