简体   繁体   English

JavaScript ES6中用于大数据异步迭代的惯用法

[英]idiom for async iteration over large data in JavaScript ES6

Is there an idiom for iterating over large datasets in ES6 to avoid browser timeout? 是否有惯用语在ES6中迭代大型数据集以避免浏览器超时?

Let's say I need to do something like generate 16 million cubes or something and that a straight forward loop times out the browser. 假设我需要做一些诸如生成1600万个多维数据集之类的事情,并且直接循环会使浏览器超时。

function generateCubes(num) {
  var cubes = [];
  for (var ii = 0; ii < num; ++ii) {
     cubes.push(generateCube());
  }
  return cubes;
}

var cubes = generateCubes(16000000);

So I can turn that into a async callback like this 所以我可以把它变成这样的异步回调

function generateCubes(num, callback) {
  var maxPerIteration = 100000;
  var cubes = [];

  function makeMore() {
    var count = Math.min(num, maxPerIteration);
    for (var ii = 0; ii < count; ++ii) {
      cubes.push(generateCube());
    }
    num -= count;
    if (count) {
      setTimeout(makeMore, 0);
    } else {
      callback(cubes);
    }
  }
  makeMore();
}

but sadly I suddenly have to restructure all my code 但是可悲的是我突然不得不重组所有代码

generateCubes(16000000, function(cubes) {
   ...
   // all the code that used to be after cubes = generateCubes
});    

I can turn that into something promise based but that only adds to the amount of boilerplate. 我可以将其变成基于promise的东西,但这只会增加样板数量。

In either case I suppose I could write a generic version 无论哪种情况,我想我都可以编写一个通用版本

function generateThings(factory, num, callback) {
  var maxPerIteration = 100000;
  var things = [];

  function makeMore() {
    var count = Math.min(num, maxPerIteration);
    for (var ii = 0; ii < count; ++ii) {
      things.push(factory());
    }
    num -= count;
    if (num) {
      setTimeout(makeMore, 0);
    } else {
      callback(things);
    }
  }
  makeMore();
}

In this particular case I'm generating 16 million things which is a kind of iteration. 在这种情况下,我将生成1600万个东西,这是一种迭代。 Maybe next I want to iterate over those things. 也许接下来我想遍历这些东西。

 function forEachAllThThings(things, op, callback) {
   var maxPerIteration = 100000;
   var num = things.length;

   function doMore() {
     var count = Math.min(num, maxPerIteration);
     for (var ii = 0; ii < count; ++ii) {
       op(things[ii]);
     }
     num -= count;
     if (num) {
       setTimeout(makeMore, 0);
     } else {
       callback();
     }
   }
   doMore();
}

Is there some more ES6 way of doing this that is more concise or more generic? 还有其他更简洁或更通用的ES6方法吗?

NOTE: Please don't get hung up on generating cubes. 注意:请不要挂在生成多维数据集上。 That's not the question. 那不是问题。 Also it's not just about the timeout issue, it can also be a jank issue. 同样,这不仅涉及超时问题,也可能是一个讨厌的问题。 For example I once worked in a project that needed to deserialize a scene graph. 例如,我曾经在一个需要对场景图进行反序列化的项目中工作。 A moderately complicated graph might take 5-10 seconds to deserialize (turn into objects). 中度复杂的图形可能需要5到10秒来反序列化(变成对象)。 During those 5-10 seconds the browser was frozen. 在这5到10秒钟内,浏览器被冻结。

The solution was similar to forEachAllTheThings above in that we only read through N objects per tick so as not to lock up the browser. 该解决方案与上面的forEachAllTheThings类似,因为我们每个刻度只读取N个对象,以免锁定浏览器。 It was all custom code. 都是自定义代码。 I'm just wondering if some of the new ES6 features provide any kind of simplification of solving the issue of doing lots of work over multiple ticks the same way they seem to simplify async code (as this is in a sense a form of async code) 我只是想知道某些新的ES6功能是否提供某种形式的简化,以解决它们似乎简化异步代码的方式(就某种意义上来说是一种异步代码)的方式来解决在多个刻度上进行大量工作的问题)


Update 更新资料

Based on @Bergi's suggestion of promisifying setTimeout I think this is what was being suggested. 基于@Bergi的建议setTimeout的建议,我认为这是建议的内容。

// returns a Promise that resolves in `time` millisecond
function sleep(time) {
  return new Promise(function(resolve, reject) {
    setTimeout(resolve, time);
  });
}

// returns a promise that resolves to an array of things
function generateThings(factory, num) {
  var maxPerIteration = 100000;
  var things = [];

  function makeMore() {
    var count = Math.min(num, maxPerIteration);
    for (var ii = 0; ii < count; ++ii) {
      things.push(factory());
    }
    num -= count;
    return num ? sleep(0).then(makeMore) : things;
  }

  // we need to start off with one promise
  // incase num <= maxPerIteration
  return Promise.resolve(makeMore());
}

function generateCube() {
  return Math.random();  // could be anything
}

generateThings(generateCube, 300000)
.then(function(things) {
  console.log(things.length);
});

I suppose that is slightly ES6ified and a couple of lines smaller assuming you already have sleep in your code (which seems like a reasonable assumption). 我想这稍微经过了ES6的验证,并且假设您已经在代码中sleep了(因此,这似乎是一个合理的假设),那么行数会减少几行。

I'd probably offload the generation of the cubes to a web worker , which won't have the timeout problem, assuming that the cubes consist only of JavaScript basic types and so could be posted to the main UI thread when ready. 我可能会将多维数据集的生成工作转移给Web worker ,而不会出现超时问题,假设这些多维数据集仅包含JavaScript基本类型,因此可以在准备就绪时发布到主UI线程中。 Ideally, the cubes would be transferrable objects so you wouldn't have to clone them, but rather transfer them, from the worker thread to the main UI thread. 理想情况下,多维数据集将是可转移的对象,因此您不必克隆它们,而只需它们从工作线程转移到主UI线程即可。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM