在Javascript（V8）中，为什么数组上的forEach比简单的for循环消耗更多的内存？

Question

I'm performing some simple data validation on a large set of data in Node.js (version v7.5.0, with a matrix of 15849x12771 entries). 我正在对Node.js中的大量数据执行一些简单的数据验证（版本v7.5.0，矩阵为15849x12771条目）。 The entire data set is in memory for now, for performance reasons. 出于性能原因，整个数据集现在都在内存中。 Therefore it is critical for me to reduce the amount of memory consumed to a theoretical minimum (each number representing 8 bytes in JS). 因此，对于我来说，将消耗的内存量减少到理论最小值（每个数字代表JS中的8个字节）至关重要。

Please compare the following ways of achieving the same thing. 请比较以下实现相同方法的方法。

with forEach 与forEach

  regressData.forEach((yxa, yxaIndex) => {
    yxa.forEach((yx, yxIndex) => {
      if (!_.isFinite(yx)) {
        throw new Error(`non-finite entry at [${yxaIndex}, ${yxIndex}]`);
      }
    });
  });

This consumes all of my node process' memory at 4GB+, causing it to never (until my patience runs out anyway) finish the loop (I guess it will use slower swap memory). 这消耗了我4GB +的所有节点进程内存，导致它永远不会（直到我耐心耗尽）完成循环（我猜它会使用较慢的交换内存）。

And then the identical version with a typical for : 然后同一版采用了典型for ：

  for (var yxai = 0, yxal = regressData.length; yxai < yxal; yxai++) {
    const yx = regressData[yxai];
    for (var yxi = 0, yxl = yx.length; yxi < yxl; yxi++) {
      if (!_.isFinite(yx[yxi])) {
        throw new Error(`non-finite entry at [${yxai}, ${yxi}]`);
      }
    }
  }

This consumes virtually no extra memory, causing the validation to be done in less than a second. 这几乎不消耗额外的内存，导致验证在不到一秒的时间内完成。

Is this behavior as expected? 这种行为是否符合预期？ I had anticipated that because the forEach s have closed scopes there would be no issues of additional memory usage when compared to a traditional for loop. 我曾预料到，因为forEach已经关闭了范围，所以与传统的for循环相比，不存在额外的内存使用问题。

EDIT: standalone test 编辑：独立测试

node --expose-gc test_foreach.js node --expose-gc test_foreach.js

if (!gc) throw new Error('please run node like node --expose-gc test_foreach.js');

const _ = require('lodash');

// prepare data to work with

const x = 15849;
const y = 12771;

let regressData = new Array(x);
for (var i = 0; i < x; i++) {
  regressData[i] = new Array(y);
  for (var j = 0; j < y; j++) {
    regressData[i][j] = _.random(true);
  }
}

// for loop
gc();
const mb_pre_for = _.round(process.memoryUsage().heapUsed / 1024 / 1024, 2);
console.log(`memory consumption before for loop ${mb_pre_for} megabyte`);
validateFor(regressData);
gc();
const mb_post_for = _.round(process.memoryUsage().heapUsed / 1024 / 1024, 2);
const mb_for = _.round(mb_post_for - mb_pre_for, 2);
console.log(`memory consumption by for loop ${mb_for} megabyte`);

// for each loop
gc();
const mb_pre_foreach = _.round(process.memoryUsage().heapUsed / 1024 / 1024, 2);
console.log(`memory consumption before foreach loop ${mb_pre_foreach} megabyte`);
validateForEach(regressData);
gc();
const mb_post_foreach = _.round(process.memoryUsage().heapUsed / 1024 / 1024, 2);
const mb_foreach = _.round(mb_post_foreach - mb_pre_foreach, 2);
console.log(`memory consumption by foreach loop ${mb_foreach} megabyte`);

function validateFor(regressData) {
  for (var yxai = 0, yxal = regressData.length; yxai < yxal; yxai++) {
    const yx = regressData[yxai];
    for (var yxi = 0, yxl = yx.length; yxi < yxl; yxi++) {
      if (!_.isFinite(yx[yxi])) {
        throw new Error(`non-finite entry at [${yxai}, ${yxi}]`);
      }
    }
  }
};

function validateForEach(regressData) {
  regressData.forEach((yxa, yxaIndex) => {
    yxa.forEach((yx, yxIndex) => {
      if (!_.isFinite(yx)) {
        throw new Error(`non-finite entry at [${yxaIndex}, ${yxIndex}]`);
      }
    });
  });
};

Output: 输出：

toms-mbp-2:mem_test tommedema$ node --expose-gc test_foreach.js
memory consumption before for loop 1549.31 megabyte
memory consumption by for loop 0.31 megabyte
memory consumption before foreach loop 1549.66 megabyte
memory consumption by foreach loop 3087.9 megabyte

Answer 1

(V8 developer here.) This is an unfortunate consequence of how Array.forEach is implemented in V8's old execution pipeline (full codegen + Crankshaft). （这里是V8开发人员。）这是如何在V8的旧执行管道（完全代码生成器+ Crankshaft）中实现Array.forEach一个不幸结果。 In short, what happens is that under some circumstances, using forEach on an array changes the internal representation of that array to a much less memory efficient format. 简而言之，在某些情况下，在数组上使用forEach会将该数组的内部表示更改为内存效率更低的格式。 (Specifically: if the array contained only double values before, and forEach has also been used on arrays with elements of other types but not too many different kinds of objects, and the code runs hot enough to get optimized. It's fairly complicated ;-) ) （具体来说：如果数组之前只包含双值，并且forEach也用于具有其他类型元素但没有太多不同类型对象的数组，并且代码运行得足够热以进行优化。它相当复杂;-) ）

With the new execution pipeline (currently behind the --future flag, will be turned on by default soon), I'm no longer seeing this additional memory consumption. 使用新的执行管道（当前位于--future标志后面，默认情况下将很快打开），我不再看到这种额外的内存消耗。

(That said, classic for loops do tend to have a small performance advantage over forEach , just because there's less going on under the hood (per ES spec). In many real workloads, the difference is too small to matter, but in microbenchmarks it's often visible. We might be able to optimize away more of forEach 's overhead in the future, but in cases where you know that every CPU cycle matters, I recommend using plain old for (var i = 0; i < array.length; i++) loops.) （也就是说，经典的for循环确实比forEach具有更小的性能优势，仅仅是因为在引擎盖下（根据ES规范）更少。在许多实际工作负载中，差异太小而无关紧要，但在微基准测试中，它是我们或许可以在将来优化掉更多forEach的开销，但是如果你知道每个CPU周期都很重要，我建议使用plain old for (var i = 0; i < array.length; i++)循环。）

在Javascript（V8）中，为什么数组上的forEach比简单的for循环消耗更多的内存？

问题描述

1 个解决方案

解决方案1
6 2017-02-27 18:54:06

在Javascript（V8）中，为什么数组上的forEach比简单的for循环消耗更多的内存？

问题描述

1 个解决方案

解决方案1 6 2017-02-27 18:54:06

解决方案1
6 2017-02-27 18:54:06