简体   繁体   中英

Handling large number of outbound HTTP requests

I am building a feed reader application were I expect to have a large number of sources. I would request new data from each source in a given time interval (eg, hourly) and then cache the response on my server. I am assuming requesting data from all sources at the same time is not the most optimal solution, as I will probably experience network congestion (I am curious to know if there would be any other bottlenecks too).

What would be an efficient way to perform such a large number of requests?

Thanks

Since, there's no urgency to any given request and you just want to make sure you hit them each periodically, you can just space all the requests out in time.

For example, if you have N sources and you want to hit each one once an hour, you can just create a list of all the sources, and keep track of an index for which source is next. Then, calculate how far apart you can make each request and still get through them all in an hour.

So, if you had N requests to process once an hour:

let listOfSources = [...];
let nextSourceIndex = 0;

const cycleTime = 1000 * 60 * 60;    // an hour in ms
const delta = Math.round(cycleTime / listOfSources.length);

// create interval timer that cycles through the sources
setInterval(() => {
   let index = nextSourceIndex++;
   if (index >= listOfSources.length) {
       // wrap back to start
       index = 0;
       nextSourceIndex = 1;
   }
   processNextSource(listOfSources[index]);
}, delta);

function processNextSource(item) {
   // process this source
}

Note, if you have a lot of sources and it takes a little while to process each one, you may still have more than one source "in flight" at the same time, but that should be OK.

If the processing was really CPU or network heavy, you would have to keep an eye on whether you're getting bogged down and can't get through all the sources in an hour. If that was the case, depending upon the bottleneck issue, you may need either more bandwidth, faster storage or more CPUs applied to the project (perhaps using worker threads or child processes).

If the number of sources is dynamic or the time to process each is dynamic and you're anywhere near your processing limits, you could make this system adaptable so that if it was getting overly busy, it would just automatically space things out more than once an hour or vice versa, if things were not so busy it could visit them more frequently. This would require keeping track of some stats and calculating a new cycleTime variable and adjusting the timer each time through the cycle.


There are different types of approaches to. A common procedure when you have a large number of asynchronous operations to get through is to process them in a way that N of them are in-flight at any given time (where N is a relatively small number such as 3 to 10). This generally avoids overloading any local resources (such as memory usage, sockets in flight, bandwidth, etc...) while still allowing you to do some parallelism in the network aspect of things. This would be the type of approach you might use if you want to get through all of them as fast as possible without overwhelming local resources whereas the previous discussion is more about spacing them out in time.

Here's an implementation of a function called mapConcurrent() that iterates an array asynchronously with no more than N requests in flight at the same time. And, here's a function calledrateMap() that is even more advanced in what type of concurrency controls it supports.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM