简体   繁体   English

处理大量出站 HTTP 请求

[英]Handling large number of outbound HTTP requests

I am building a feed reader application were I expect to have a large number of sources.如果我希望有大量来源,我正在构建一个提要阅读器应用程序。 I would request new data from each source in a given time interval (eg, hourly) and then cache the response on my server.我会在给定的时间间隔(例如,每小时)内从每个源请求新数据,然后将响应缓存在我的服务器上。 I am assuming requesting data from all sources at the same time is not the most optimal solution, as I will probably experience network congestion (I am curious to know if there would be any other bottlenecks too).我假设同时从所有来源请求数据不是最佳解决方案,因为我可能会遇到网络拥塞(我很想知道是否还会有其他瓶颈)。

What would be an efficient way to perform such a large number of requests?执行如此大量请求的有效方法是什么?

Thanks谢谢

Since, there's no urgency to any given request and you just want to make sure you hit them each periodically, you can just space all the requests out in time.因为,任何给定的请求都没有紧迫性,您只想确保定期点击它们,您可以及时将所有请求分开。

For example, if you have N sources and you want to hit each one once an hour, you can just create a list of all the sources, and keep track of an index for which source is next.例如,如果您有 N 个来源,并且您想每小时点击一次,您只需创建所有来源的列表,并跟踪下一个来源的索引。 Then, calculate how far apart you can make each request and still get through them all in an hour.然后,计算你可以在一个小时内完成每个请求的距离。

So, if you had N requests to process once an hour:因此,如果您有 N 个请求每小时处理一次:

let listOfSources = [...];
let nextSourceIndex = 0;

const cycleTime = 1000 * 60 * 60;    // an hour in ms
const delta = Math.round(cycleTime / listOfSources.length);

// create interval timer that cycles through the sources
setInterval(() => {
   let index = nextSourceIndex++;
   if (index >= listOfSources.length) {
       // wrap back to start
       index = 0;
       nextSourceIndex = 1;
   }
   processNextSource(listOfSources[index]);
}, delta);

function processNextSource(item) {
   // process this source
}

Note, if you have a lot of sources and it takes a little while to process each one, you may still have more than one source "in flight" at the same time, but that should be OK.请注意,如果您有很多来源并且处理每个来源都需要一些时间,那么您可能仍然有多个来源同时“正在运行”,但这应该没问题。

If the processing was really CPU or network heavy, you would have to keep an eye on whether you're getting bogged down and can't get through all the sources in an hour.如果处理确实是 CPU 或网络繁重,您将不得不密切关注您是否陷入困境并且无法在一小时内完成所有来源。 If that was the case, depending upon the bottleneck issue, you may need either more bandwidth, faster storage or more CPUs applied to the project (perhaps using worker threads or child processes).如果是这种情况,根据瓶颈问题,您可能需要更多带宽、更快的存储或更多的 CPU 应用于项目(可能使用工作线程或子进程)。

If the number of sources is dynamic or the time to process each is dynamic and you're anywhere near your processing limits, you could make this system adaptable so that if it was getting overly busy, it would just automatically space things out more than once an hour or vice versa, if things were not so busy it could visit them more frequently.如果源的数量是动态的,或者处理每个源的时间是动态的,并且您已接近处理限制,则可以使该系统具有适应性,以便在它变得过于繁忙时,它会自动将事情分开一次以上一个小时,反之亦然,如果事情不那么忙,它可以更频繁地访问它们。 This would require keeping track of some stats and calculating a new cycleTime variable and adjusting the timer each time through the cycle.这将需要跟踪一些统计数据并计算新的cycleTime变量并在每次循环中调整计时器。


There are different types of approaches to.有不同类型的方法。 A common procedure when you have a large number of asynchronous operations to get through is to process them in a way that N of them are in-flight at any given time (where N is a relatively small number such as 3 to 10).当您需要处理大量异步操作时,一个常见的过程是以其中 N 个在任何给定时间都在进行中的方式处理它们(其中 N 是一个相对较小的数字,例如 3 到 10)。 This generally avoids overloading any local resources (such as memory usage, sockets in flight, bandwidth, etc...) while still allowing you to do some parallelism in the network aspect of things.这通常可以避免任何本地资源过载(例如 memory 使用情况、sockets 在飞行中、带宽等......),同时仍然允许您在网络方面进行一些并行处理。 This would be the type of approach you might use if you want to get through all of them as fast as possible without overwhelming local resources whereas the previous discussion is more about spacing them out in time.如果您想尽可能快地通过所有这些而不会使本地资源不堪重负,那么这将是您可能使用的方法类型,而前面的讨论更多是关于及时将它们分开。

Here's an implementation of a function called mapConcurrent() that iterates an array asynchronously with no more than N requests in flight at the same time.这是一个名为mapConcurrent()的 function 的实现,它异步迭代一个数组,同时运行的请求不超过 N 个。 And, here's a function calledrateMap() that is even more advanced in what type of concurrency controls it supports.而且,这是一个名为rateMap()的 function,它在它支持的并发控制类型方面更加先进。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM