简体   繁体   English

使用rxjs通过nodejs request.get放置的批处理请求

[英]Batch requests placed via nodejs request.get using rxjs

I am currently using the following function to create a Promise from the result of calling request.get : 我目前正在使用以下函数根据调用request.get的结果创建一个Promise

function dlPromiseForMeta(meta) {
    return new Promise(function (resolve, reject) {

        meta.error = false;

        var fileStream = fs.createWriteStream(meta.filePath);

        fileStream.on('error', function (error) {
            meta.error = true;
            console.log('filesystem ' + meta.localFileName + ' ERROR: ' + error);
            console.log('record: ' + JSON.stringify(meta));
            reject(meta);
        });

        fileStream.on('close', function () {
            resolve(meta);
        });

        request.get({
            uri: meta.url,
            rejectUnauthorized: false,
            followAllRedirects: true,
            pool: {
                maxSockets: 1000
            },
            timeout: 10000,
            agent: false
        })
            .on('socket', function () {
                console.log('request ' + meta.localFileName + ' made');
            })
            .on('error', function (error) {
                meta.error = true;
                console.log('request ' + meta.localFileName + ' ERROR: ' + error);
                console.log('record: ' + JSON.stringify(meta));
                reject(meta);
            })
            .on('end', function () {
                console.log('request ' + meta.localFileName + ' finished');
                fileStream.close();
            })
            .pipe(fileStream);
    });
}

This works fine except when I am trying to call it too many times, as in the example below, where imagesForKeywords returns an rxjs Observable : 除非我尝试多次调用它,否则此方法工作正常,如以下示例所示,其中imagesForKeywords返回rxjs Observable

imagesForKeywords(keywords, numberOfResults)
    .mergeMap(function (meta) {
        meta.fileName = path.basename(url.parse(meta.url).pathname);
        meta.localFileName = timestamp + '_' + count++ + '_' + meta.keyword + '_' + meta.source + path.extname(meta.fileName);
        meta.filePath = path.join(imagesFolder, meta.localFileName);

        return rxjs.Observable.fromPromise(dlPromiseForMeta(meta))(meta);
    });

I start getting ESOCKETTIMEDOUT errors when the source observable becomes sufficiently large. 当可观察的源变得足够大时,我开始收到ESOCKETTIMEDOUT错误。

So what I would like to do is somehow batch what happens in mergeMap for every, say, 100 entries... so I do those 100 in parallel, and each batch serially, and then merge them at the end. 因此,我想做的是以某种方式批量处理mergeMap中每个条目(例如100个条目)发生的事情……所以我并行执行这100 100条目,然后依次执行每个批处理,然后在最后合并它们。

How can I accomplish this using rxjs ? 如何使用rxjs完成此操作?

I think the simplest thing to use is bufferTime() which triggers after a certain number of ms but also has a parameter at the end for count. 我认为最简单的方法是使用bufferTime() ,它会在一定数量的ms后触发,但最后还有一个用于计数的参数。

Using a timeout seems useful, in case there's a stream pattern that does not reach the batch limit in a reasonable time. 如果存在在合理时间内未达到批处理限制的流模式,使用超时似乎很有用。

If that does not fit your use-case, comment me with some more details and I will adjust accordingly. 如果这不适合您的用例,请用更多详细信息评论我,我将作相应调整。

Your code will look like this, 您的代码将如下所示,

  • bufferTime as described above 如上所述的bufferTime
  • forkjoin - run the buffer contents in parallel and emit when all return forkjoin-并行运行缓冲区内容,并在所有返回时发出
  • mergeMap - coalesce the results mergeMap-合并结果
imagesForKeywords(keywords, numberOfResults)
  .mergeMap(function (meta) {
    meta.fileName = path.basename(url.parse(meta.url).pathname);
    meta.localFileName = timestamp + '_' + count++ + '_' + meta.keyword + '_' + meta.source + path.extname(meta.fileName);
    meta.filePath = path.join(imagesFolder, meta.localFileName);
    return meta;
  })
  .bufferTime(maxTimeout, null, maxBatch)
  .mergeMap(items => rxjs.Observable.forkJoin(items.map(dlPromiseForMeta)))
  .mergeMap(arr => rxjs.Observable.from(arr))

Here's a runnable mockup to show it working. 这是一个可运行的模型,以显示其工作原理。 Have commented out the last mergeMap to show the buffering. 已注释掉最后一个mergeMap以显示缓冲。

I have assumed a couple of things, 我已经做了几件事,

  • imagesForKeywords breaks keywords into observable stream of keyword imagesForKeywords将关键字分解为可观察的关键字流
  • there is one keyword per dlPromiseForMeta call 每个dlPromiseForMeta调用只有一个关键字

 // Some mocking const imagesForKeywords = (keywords, numberOfResults) => { return Rx.Observable.from(keywords.map(keyword => { return {keyword} })) } const dlPromiseForMeta = (meta) => { return Promise.resolve(meta.keyword + '_image') } // Compose meta - looks like it can run at scale, since is just string manipulations. const composeMeta = meta => { // meta.fileName = path.basename(url.parse(meta.url).pathname); // meta.localFileName = timestamp + '_' + count++ + '_' + meta.keyword + '_' + meta.source + path.extname(meta.fileName); // meta.filePath = path.join(imagesFolder, meta.localFileName); return meta; } const maxBatch = 3 const maxTimeout = 50 //ms const bufferedPromises = (keywords, numberOfResults) => imagesForKeywords(keywords, numberOfResults) .map(composeMeta) .bufferTime(maxTimeout, null, maxBatch) .mergeMap(items => Rx.Observable.forkJoin(items.map(dlPromiseForMeta))) //.mergeMap(arr => Rx.Observable.from(arr)) const keywords = ['keyw1', 'keyw2', 'keyw3', 'keyw4', 'keyw5', 'keyw6', 'keyw7']; const numberOfResults = 1; bufferedPromises(keywords, numberOfResults) .subscribe(console.log) 
 <script src="https://cdnjs.cloudflare.com/ajax/libs/rxjs/5.5.6/Rx.js"></script> 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM