简体   繁体   English

如何在Node.js流回调中聚合由异步函数生成的promise?

[英]How do I aggregate promises generated from async functions within a Node.js stream callback?

I have a Node.js Typescript program in which I'm trying to parse large CSV files line by line and do something with those lines asynchronously. 我有一个Node.js Typescript程序,我试图在其中逐行解析大型CSV文件,并异步地对这些行进行处理。 More specifically, I need a function that will: 更具体地说,我需要一个函数来:

  1. Open a CSV file. 打开一个CSV文件。
  2. Parse the next line to an object. 将下一行解析为一个对象。
  3. (Ideally) Collect a set number of objects for batch processing. (理想情况下)收集一定数量的对象以进行批处理。
  4. Pass the object(s) to an async function for processing (returns a promise). 将对象传递给异步函数进行处理(返回承诺)。
  5. Collect the promises from the processing function. 从处理功能中收集承诺。

Some requirements and considerations: 一些要求和注意事项:

  • I need to poll any of these promises for progress. 我需要兑现这些承诺中的任何进展。
  • Assume these CSV files are large; 假设这些CSV文件很大; streaming line by line is necessary. 逐行流式传输是必要的。
  • I shouldn't block the application while these processing operations are running. 这些处理操作正在运行时,我不应该阻止该应用程序。
  • Returning an array of promises may not be the right approach, especially if I'm trying to read a file with a million lines. 返回承诺的数组可能不是正确的方法,尤其是当我尝试读取一百万行的文件时。
  • I need a hook of sorts to cancel or retry a failed operation. 我需要各种各样的挂钩才能取消或重试失败的操作。

Here's some test code I've gotten working. 这是我已经开始工作的一些测试代码。 ObjectStream is a custom Node.js Transform that converts CSV lines to objects. ObjectStream是一个自定义Node.js转换,可将CSV行转换为对象。

function parseFileAsync(filePath: string): Promise<any> {
    var doParseFileAsync = (filePath: string) => {
        var streamDeferred = q.defer<Promise<any>[]>();
        var promises: Promise<any>[] = [];
        var propertyNames: string[] = [];

        var stream = fs.createReadStream(filePath, { encoding: "utf8" })
            .pipe(new LineStream({ objectMode: true }))
            .pipe(new ObjectStream({ objectMode: true }));

        stream.on("readable", () => {
            var obj: Object;
            while ((obj = stream.read()) !== null) {
                console.log(`\nRead an object...`);

                var operationDeferred = q.defer<any>();
                operationDeferred.resolve(doSomethingAsync(obj));
                promises.push(operationDeferred.promise);
            }
        });
        stream.on("end", () => {
            streamDeferred.resolve(promises);
        });

        return streamDeferred.promise;
    }

    return doParseFileAsync(filePath)
        .then((result: Promise<any>[]) => {
            return q.all(result);
        });
}
parseFileAsync(filePath)
    .done((result: any[]) => {
        console.log(`\nFinished reading and processing the file:\n\t${result.toString()}`);
    });

The final done call is executed before the stream even starts running, because parseFileAsync immediately fulfills with an empty array; 最后的done调用在流甚至开始运行之前就执行了,因为parseFileAsync立即用一个空数组满足; the stream hasn't had a chance to push any promises yet. 该信息流还没有机会兑现任何承诺。

After days of searching, I'm still not sure what the correct way to do this is. 经过几天的搜索,我仍然不确定执行此操作的正确方法是什么。 Node/JavaScript experts: help? 节点/ JavaScript专家:帮助吗?

Update 更新资料

The code has been updated, and my promises are now playing nicely. 代码已更新,我的诺言现在运行良好。 However, I need a way to hook into the stream and cancel the process if desired. 但是,我需要一种挂接到流中并在需要时取消进程的方法。 I also need a way to retry any operations that failed. 我还需要一种方法来重试任何失败的操作。

I was running into some limitations in the program's architecture that wouldn't allow me to pass promises around as freely as I wanted. 我在程序的体系结构中遇到了一些限制,这些限制使我无法随意地实现承诺。 So instead, rather than kicking off a bunch of promises, I decided to wait until the previous batch finishes before starting on a new one. 因此,我决定没有等到许诺,而是决定等到前一批完成后再开始新的承诺。 Here's the approach I took: 这是我采取的方法:

  1. Separate the stream stuff into its own function that accepts continuation tokens. 将流内容分成接受连续令牌的自己的函数。 The return value will contain the data read as well as a continuation token if there's more data to be read: 如果要读取的数据更多,则返回值将包含读取的数据以及延续令牌:

     function readFile(filepath: string, lines: number, start: any): Promise<any> { ... } 
  2. Define a function that will run the retry-able operation. 定义一个将运行可重试操作的函数。 Within the body of this function, retrieve and process a chunk of data from the file. 在此函数的主体内,从文件中检索和处理大量数据。 If the result has a continuation token, "recursively" call the operation function again: 如果结果具有延续令牌,请再次“递归”调用操作函数:

     function processFile(filepath: string, next: any): Promise<any> { var chunkSize = 1; return readLines(filepath, chunkSize, next) .then((result) => { // Do something with `result.lines` ... if (result.next) { return parseFile(filepath, result.next); } }); } 

And voila! 瞧! A long-running operation that operates on chunks and is easy to report progress on. 一项长时间运行的操作,对块进行操作并且很容易报告进度。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM