简体   繁体   English

优雅的方式在循环中批量运行大量异步“事物”,直到第一个“事物”返回时才知道总数?

[英]Elegant way to run a lot of asynchronous “things” in batches in a loop when the total isn't known until the first “thing” returns?

The problem I'm working on is calling the Stack Exchange API (1.1) on all pages (questions, tags, whatever). 我正在处理的问题是在所有页面上调用Stack Exchange API(1.1)(问题,标签,等等)。 But in fact it seems like it could be a general problem too so I'm posting here rather than on StackApps. 但实际上它似乎也是一个普遍的问题所以我在这里而不是在StackApps上发布。

So the easy way is to do a preliminary call just to fetch the total then put the rest in a loop. 因此, 简单的方法是进行初步调用以获取总数,然后将其余部分放入循环中。

But this first call could actually fetch the first page of results too and save one call. 但是第一次调用实际上也可以获取结果的第一页并保存一个调用。

But making this first call a special case seems to complicate the code much more than the above "easy way". 但是,将这个第一个调用作为特殊情况似乎比上述“简单方法”更加复杂。

It's complicated by the fact that I can fetch multiple pages at once but not all of them due to rate limits. 由于速率限制,我可以一次获取多个页面而不是所有页面,这一点很复杂。

I'll be using JavaScript with jQuery if they provide anything helpful. 如果它们提供任何有用的东西,我将使用JavaScript和jQuery。

Here is some pseudocode for what I've thought of but I haven't been able to get it working yet: 这是我想到的一些伪代码,但我还没有能够使它工作:

batch_num = 0
batch_size = 1 // how many pages to fetch in each batch. 1st is just 1 so we know the total

forever {
  get_batch (batch_size)

  if (batch_num == 0) {
    calculate batch_num to use from now on based on the total number of pages and the rate limits
  }

  if (batch_num == last) {
    break
  }

  ++ batch_num
}

exit

function get_batch (batch_size) {
  for (i = 0; i < batch_size; ++i) {
    getJSON next page
  }
}

The code is oversimplified because what goes in the asynchronous callbacks is important and makes the code more complicated and harder to read. 代码过于简单,因为异步回调中的内容很重要,使代码更复杂,更难读。

I've tried both iterative and recursive approaches but can't get my head around the details to get it right. 我已经尝试了迭代和递归方法,但无法理解细节以使其正确。

So is the "easy way" the best way despite requiring an extra asynchronous call? 那么尽管需要额外的异步调用,“简单方法”是最好的方式吗? Or is there actually a way to get something like my pseudocode working that is elegant rather than convoluted? 或者实际上是否有办法让我的伪代码工作优雅而不是复杂?

(If you feel this is too specialized and doesn't generalize belong the SE API then I'm happy to migrate it to StackApps.) (如果您认为这太专业并且没有概括属于SE API,那么我很乐意将其迁移到StackApps。)

Eventually some time after posing this question I got into node.js where dealing with asynchronous code is arguably even more important than in browser JavaScript. 在提出这个问题之后的某个时候,我进入了node.js,处理异步代码可能比浏览器JavaScript更重要。

One of the most popular modules / libraries for making asynchronous control flow and iteration on containers easy is Async.js by "caolan" . 用于在容器上进行异步控制流和迭代的最流行的模块/库之一是“caolan”的Async.js

It includes three functions with batching support: 它包括三个具有批处理支持的功能:

limit - The maximum number of iterators / tasks to run at any time. limit - 随时运行的最大迭代器/任务数。

I would say that you should in 99% of the cases go for the easy way. 我会说你应该在99%的情况下采取简单的方法。

Consider that if you get a count of 100, you will be making 101 calls instead of 100 which isn't worth the code complication which you probably will regret if you need to change the logic later. 考虑一下,如果你的计数为100,那么你将进行101次调用而不是100次,这对于代码复杂化是不值得的,如果你以后需要更改逻辑,你可能会后悔。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM