[英]In Javascript (V8) why does forEach on an array consume much more memory than a simple for loop?
I'm performing some simple data validation on a large set of data in Node.js (version v7.5.0, with a matrix of 15849x12771 entries). 我正在对Node.js中的大量数据执行一些简单的数据验证(版本v7.5.0,矩阵为15849x12771条目)。 The entire data set is in memory for now, for performance reasons.
出于性能原因,整个数据集现在都在内存中。 Therefore it is critical for me to reduce the amount of memory consumed to a theoretical minimum (each number representing 8 bytes in JS).
因此,对于我来说,将消耗的内存量减少到理论最小值(每个数字代表JS中的8个字节)至关重要。
Please compare the following ways of achieving the same thing. 请比较以下实现相同方法的方法。
with forEach
与
forEach
regressData.forEach((yxa, yxaIndex) => {
yxa.forEach((yx, yxIndex) => {
if (!_.isFinite(yx)) {
throw new Error(`non-finite entry at [${yxaIndex}, ${yxIndex}]`);
}
});
});
This consumes all of my node process' memory at 4GB+, causing it to never (until my patience runs out anyway) finish the loop (I guess it will use slower swap memory). 这消耗了我4GB +的所有节点进程内存,导致它永远不会(直到我耐心耗尽)完成循环(我猜它会使用较慢的交换内存)。
And then the identical version with a typical for
: 然后同一版采用了典型
for
:
for (var yxai = 0, yxal = regressData.length; yxai < yxal; yxai++) {
const yx = regressData[yxai];
for (var yxi = 0, yxl = yx.length; yxi < yxl; yxi++) {
if (!_.isFinite(yx[yxi])) {
throw new Error(`non-finite entry at [${yxai}, ${yxi}]`);
}
}
}
This consumes virtually no extra memory, causing the validation to be done in less than a second. 这几乎不消耗额外的内存,导致验证在不到一秒的时间内完成。
Is this behavior as expected? 这种行为是否符合预期? I had anticipated that because the
forEach
s have closed scopes there would be no issues of additional memory usage when compared to a traditional for
loop. 我曾预料到,因为
forEach
已经关闭了范围,所以与传统的for
循环相比,不存在额外的内存使用问题。
EDIT: standalone test 编辑:独立测试
node --expose-gc test_foreach.js node --expose-gc test_foreach.js
if (!gc) throw new Error('please run node like node --expose-gc test_foreach.js');
const _ = require('lodash');
// prepare data to work with
const x = 15849;
const y = 12771;
let regressData = new Array(x);
for (var i = 0; i < x; i++) {
regressData[i] = new Array(y);
for (var j = 0; j < y; j++) {
regressData[i][j] = _.random(true);
}
}
// for loop
gc();
const mb_pre_for = _.round(process.memoryUsage().heapUsed / 1024 / 1024, 2);
console.log(`memory consumption before for loop ${mb_pre_for} megabyte`);
validateFor(regressData);
gc();
const mb_post_for = _.round(process.memoryUsage().heapUsed / 1024 / 1024, 2);
const mb_for = _.round(mb_post_for - mb_pre_for, 2);
console.log(`memory consumption by for loop ${mb_for} megabyte`);
// for each loop
gc();
const mb_pre_foreach = _.round(process.memoryUsage().heapUsed / 1024 / 1024, 2);
console.log(`memory consumption before foreach loop ${mb_pre_foreach} megabyte`);
validateForEach(regressData);
gc();
const mb_post_foreach = _.round(process.memoryUsage().heapUsed / 1024 / 1024, 2);
const mb_foreach = _.round(mb_post_foreach - mb_pre_foreach, 2);
console.log(`memory consumption by foreach loop ${mb_foreach} megabyte`);
function validateFor(regressData) {
for (var yxai = 0, yxal = regressData.length; yxai < yxal; yxai++) {
const yx = regressData[yxai];
for (var yxi = 0, yxl = yx.length; yxi < yxl; yxi++) {
if (!_.isFinite(yx[yxi])) {
throw new Error(`non-finite entry at [${yxai}, ${yxi}]`);
}
}
}
};
function validateForEach(regressData) {
regressData.forEach((yxa, yxaIndex) => {
yxa.forEach((yx, yxIndex) => {
if (!_.isFinite(yx)) {
throw new Error(`non-finite entry at [${yxaIndex}, ${yxIndex}]`);
}
});
});
};
Output: 输出:
toms-mbp-2:mem_test tommedema$ node --expose-gc test_foreach.js
memory consumption before for loop 1549.31 megabyte
memory consumption by for loop 0.31 megabyte
memory consumption before foreach loop 1549.66 megabyte
memory consumption by foreach loop 3087.9 megabyte
(V8 developer here.) This is an unfortunate consequence of how Array.forEach
is implemented in V8's old execution pipeline (full codegen + Crankshaft). (这里是V8开发人员。)这是如何在V8的旧执行管道(完全代码生成器+ Crankshaft)中实现
Array.forEach
一个不幸结果。 In short, what happens is that under some circumstances, using forEach
on an array changes the internal representation of that array to a much less memory efficient format. 简而言之,在某些情况下,在数组上使用
forEach
会将该数组的内部表示更改为内存效率更低的格式。 (Specifically: if the array contained only double values before, and forEach
has also been used on arrays with elements of other types but not too many different kinds of objects, and the code runs hot enough to get optimized. It's fairly complicated ;-) ) (具体来说:如果数组之前只包含双值,并且
forEach
也用于具有其他类型元素但没有太多不同类型对象的数组,并且代码运行得足够热以进行优化。它相当复杂;-) )
With the new execution pipeline (currently behind the --future
flag, will be turned on by default soon), I'm no longer seeing this additional memory consumption. 使用新的执行管道(当前位于
--future
标志后面,默认情况下将很快打开),我不再看到这种额外的内存消耗。
(That said, classic for
loops do tend to have a small performance advantage over forEach
, just because there's less going on under the hood (per ES spec). In many real workloads, the difference is too small to matter, but in microbenchmarks it's often visible. We might be able to optimize away more of forEach
's overhead in the future, but in cases where you know that every CPU cycle matters, I recommend using plain old for (var i = 0; i < array.length; i++)
loops.) (也就是说,经典的
for
循环确实比forEach
具有更小的性能优势,仅仅是因为在引擎盖下(根据ES规范)更少。在许多实际工作负载中,差异太小而无关紧要,但在微基准测试中,它是我们或许可以在将来优化掉更多forEach
的开销,但是如果你知道每个CPU周期都很重要,我建议使用plain old for (var i = 0; i < array.length; i++)
循环。)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.