简体   繁体   中英

Elegant way to run a lot of asynchronous “things” in batches in a loop when the total isn't known until the first “thing” returns?

The problem I'm working on is calling the Stack Exchange API (1.1) on all pages (questions, tags, whatever). But in fact it seems like it could be a general problem too so I'm posting here rather than on StackApps.

So the easy way is to do a preliminary call just to fetch the total then put the rest in a loop.

But this first call could actually fetch the first page of results too and save one call.

But making this first call a special case seems to complicate the code much more than the above "easy way".

It's complicated by the fact that I can fetch multiple pages at once but not all of them due to rate limits.

I'll be using JavaScript with jQuery if they provide anything helpful.

Here is some pseudocode for what I've thought of but I haven't been able to get it working yet:

batch_num = 0
batch_size = 1 // how many pages to fetch in each batch. 1st is just 1 so we know the total

forever {
  get_batch (batch_size)

  if (batch_num == 0) {
    calculate batch_num to use from now on based on the total number of pages and the rate limits
  }

  if (batch_num == last) {
    break
  }

  ++ batch_num
}

exit

function get_batch (batch_size) {
  for (i = 0; i < batch_size; ++i) {
    getJSON next page
  }
}

The code is oversimplified because what goes in the asynchronous callbacks is important and makes the code more complicated and harder to read.

I've tried both iterative and recursive approaches but can't get my head around the details to get it right.

So is the "easy way" the best way despite requiring an extra asynchronous call? Or is there actually a way to get something like my pseudocode working that is elegant rather than convoluted?

(If you feel this is too specialized and doesn't generalize belong the SE API then I'm happy to migrate it to StackApps.)

Eventually some time after posing this question I got into node.js where dealing with asynchronous code is arguably even more important than in browser JavaScript.

One of the most popular modules / libraries for making asynchronous control flow and iteration on containers easy is Async.js by "caolan" .

It includes three functions with batching support:

limit - The maximum number of iterators / tasks to run at any time.

I would say that you should in 99% of the cases go for the easy way.

Consider that if you get a count of 100, you will be making 101 calls instead of 100 which isn't worth the code complication which you probably will regret if you need to change the logic later.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM