简体   繁体   English

Node.js出站http请求并发

[英]Node.js outbound http request concurrency

I've got a node.js script that pulls data from an external web API for local storage. 我有一个node.js脚本,它从外部Web API中提取数据以进行本地存储。 The first request is a query that returns a list of IDs that I need to get further information on. 第一个请求是一个查询,它返回我需要获取更多信息的ID列表。 For each ID returned, I spawn a new http request from node.js and reach out to the server for the data (POST request). 对于返回的每个ID,我从node.js生成一个新的http请求,并向服务器伸出数据(POST请求)。 Once the job is complete, I sleep for 3 minutes, and repeat. 一旦工作完成,我睡了3分钟,然后重复。 Sometimes the number of IDs is in the hundreds. 有时,ID的数量是数百个。 Each individual http request for those returns maybe 1kb of data, usually less, so the round trip is very short. 对于那些返回的每个单独的http请求可能是1kb的数据,通常更少,因此往返非常短。

I got an email this morning from the API provider begging me to shut off my process because I'm "occupying all of the API servers with hundreds of connections" (which I am actually pretty proud of, but that is not the point). 今天早上我从API提供商处收到一封电子邮件,请求我关闭我的流程,因为我“占用了所有具有数百个连接的API服务器”(我实在为此感到非常自豪,但这不是重点)。 To be nice, I increased the sleep from 3 minutes to 30 minutes, and that has so far helped them . 为了好,我将睡眠从3分钟增加到30分钟,这迄今为止帮助了他们

On to the question... now I've not set maxSockets or anything, so I believe the default is 5. Shouldn't that mean I can only create 5 live http request connections at a time? 关于问题...现在我没有设置maxSockets或任何东西,所以我认为默认值是5.这不是说这意味着我一次只能创建5个实时http请求连接吗? How does the admin have hundreds? 管理员有几百个? Is their server not hanging up the connection once the data is delivered? 数据传送后,他们的服务器是否挂断了连接? Am I not doing so? 我没有这样做吗? I don't have an explicit disconnect at the end of my http request, so perhaps I am at fault here. 我的http请求结束时没有明确的断开连接,所以也许我在这里有错。 So what does maxSockets actually set? 那么maxSockets实际上设置了什么?

Sorry for some reason I didn't read your question correctly 很抱歉由于某种原因,我没有正确地阅读您的问题

maxSockets is the max number of connections the http module will make for that current process. maxSockets是http模块为当前进程创建的最大连接数。 You can check to see what yours is currently set at by accessing it from http.globalAgent.maxSockets . 您可以通过从http.globalAgent.maxSockets访问它来查看您当前设置的http.globalAgent.maxSockets

You can see some information on the current number of connections you have to a given host with the following: 您可以通过以下方式查看有关给定主机的当前连接数的一些信息:

console.log("Active socket connections: %d", http.globalAgent.sockets['localhost:8080'].length )
console.log("Total queued requests: %d", http.globalAgent.requests['localhost:8080'].length)

Substituting the localhost:8080 for what ever host and port you are making the request too. localhost:8080替换为您正在提出请求的主机和端口。

You can see how node handles these connections at the following two points: 您可以在以下两点看到节点如何处理这些连接:

Adding a new connection and storing to the request queue 添加新连接并存储到请求队列

https://github.com/joyent/node/blob/master/lib/_http_agent.js#L83 https://github.com/joyent/node/blob/master/lib/_http_agent.js#L83

Creating connections from queued requests 从排队的请求创建连接

https://github.com/joyent/node/blob/master/lib/_http_agent.js#L148 https://github.com/joyent/node/blob/master/lib/_http_agent.js#L148


I wrote this up really quick to give you an idea how you could stagger those requests out a bit. 我很快就写了这篇文章,让你知道如何将这些请求错开一些。 This particular code doesn't check to see how many requests are "pending" you could easily modify it to allow you to only have a set number of requests going out at any given time (which honestly would be the better way to do it). 这个特殊的代码不会检查有多少请求是“待定”的,您可以轻松修改它以允许您在任何给定时间只有一定数量的请求发出(实际上这是更好的方法) 。

var Stagger = function (data, stagger, fn, cb) {

    var self        = this;
    this.timerID    = 0;
    this.data       = [].concat(data);
    this.fn         = fn;
    this.cb         = cb;
    this.stagger    = stagger;
    this.iteration  = 0;
    this.store      = {};

    this.start = function () {
        (function __stagger() {

            self.fn(self.iteration, self.data[self.iteration], self.store);

            self.iteration++;

            if (self.iteration != self.data.length)
                self.timerID = setTimeout(__stagger, self.stagger);
            else
                cb(self.store);

        })();
    };

    this.stop = function () {
        clearTimeout(self.timerID);

    };
};


var t = new Stagger([1,2,3,4,5,6], 1000, function (i, item, store) {
    console.log(i, item);
    if (!store.out) store.out = [];

    store.out[i] = Math.pow(2,i);
},
function (store) {
    console.log('Done!', store);
});

t.start();

This code can definitely could be improved but it should give you an idea of maybe where to start. 这个代码肯定可以改进,但它应该让你知道从哪里开始。

Live Demo: http://jsbin.com/ewoyik/1/edit (note: requires console) 现场演示: http//jsbin.com/ewoyik/1/edit (注:需要控制台)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM