简体   繁体   English

如何限制flatMap的并发性?

[英]How to limit the concurrency of flatMap?

I'm trying to use RxJS to write a script to process several hundreds of log files, each of which is about 1GB. 我正在尝试使用RxJS编写一个脚本来处理数百个日志文件,每个文件大约1GB。 The skeleton of the script looks like 脚本的骨架看起来像

Rx.Observable.from(arrayOfLogFilePath)
.flatMap(function(logFilePath){
   return Rx.Node.fromReadStream(logFilePath)
   .filter(filterLogLine)
})
.groupBy(someGroupingFunc)
.map(someFurtherProcessing)
.subscribe(...)

The code works, but notice that the filtering step of all log files will start concurrently. 代码有效,但请注意所有日志文件的过滤步骤将同时启动。 However, from file system IO performance perspective, it is preferable to process one file after another (or at least to limit the concurrency to a few files rather than opening all hundreds of files in the same time). 但是,从文件系统IO性能的角度来看,最好一个接一个地处理一个文件(或者至少将并发限制为几个文件而不是同时打开所有数百个文件)。 In this regard, how can I implement it in a "functional reactive way"? 在这方面,我如何以“功能反应方式”实施?

I had thought of scheduler but could not figure out how it can help here. 我曾想过调度程序,但无法弄清楚它在这里有什么用处。

You can use .merge(maxConcurrent) to limit the concurrency. 您可以使用.merge(maxConcurrent)来限制并发性。 Because .merge(maxConcurrent) flattens a metaobservable (observable of observables) into an observable, you need to replace the .flatMap with .map so that the output is a metaobservable ("unflat"), then you call .merge(maxConcurrent) . 因为.merge(maxConcurrent)一个metaobservable(可观察的observables)展平为一个observable,你需要用.map替换.flatMap以便输出是一个metaobservable(“unflat”),然后你调用.merge(maxConcurrent)

Rx.Observable.from(arrayOfLogFilePath)
.map(function(logFilePath){
   return Rx.Node.fromReadStream(logFilePath)
   .filter(filterLogLine)
})
.merge(2) // 2 concurrent 
.groupBy(someGroupingFunc)
.map(someFurtherProcessing)
.subscribe(...)

This code hasn't been tested (since I don't have access to the development environment you have), but this is how to proceed. 此代码尚未经过测试(因为我无法访问您拥有的开发环境),但这是如何继续进行的。 RxJS doesn't have many operators with concurrency parameters, but you can almost always do what you need with .merge(maxConcurrent) . RxJS没有很多具有并发参数的运算符,但你几乎总能用.merge(maxConcurrent)做你需要的。

I have just solved a similar problem with RxJs 5, so I hope the solution can help others with a similar problem. 我刚刚用RxJs 5解决了类似的问题,所以我希望解决方案可以帮助其他人解决类似的问题。

 // Simulate always processing 2 requests in parallel (when one is finished it starts processing one more), // retry two times, push error on stream if retry fails. //const Rx = require('rxjs-es6/Rx'); // -- Global variabel just to show that it works. -- let parallelRequests = 0; // -------------------------------------------------- function simulateRequest(req) { console.log("Request " + req); // --- To log retries --- var retry = 0; // ---------------------- // Can't retry a promise, need to restart before the promise is made. return Rx.Observable.of(req).flatMap(req => new Promise((resolve, reject) => { var random = Math.floor(Math.random() * 2000); // -- To show that it works -- if (retry) { console.log("Retrying request " + req + " ,retry " + retry); } else { parallelRequests++; } // --------------------------- setTimeout(() => { if (random < 900) { retry++; return reject(req + " !!!FAILED!!!"); } return resolve(req); }, random); })).retry(2).catch(e => Rx.Observable.of(e)); } Rx.Observable.range(1, 10) .flatMap(e => simulateRequest(e), null, 2) // -- To show that it works -- .do(() => { console.log("ParallelRequests " + parallelRequests); parallelRequests--; }) // --------------------------- .subscribe(e => console.log("Response from request " + e), e => console.log("Should not happen, error: " + e), e => console.log("Finished")); 
 <script src="https://npmcdn.com/@reactivex/rxjs@5.0.0-beta.6/dist/global/Rx.umd.js"></script> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM