[英]JavaScript array performance drop at 13k-16k elements
I am doing some performance testing regarding creating and editing performance of arrays and noticed that there are some weird characteristics around arrays with about 13k-16k elements.我正在对数组的创建和编辑性能进行一些性能测试,并注意到具有大约 13k-16k 元素的数组周围有一些奇怪的特征。
The below graphs show the time per element it takes to create an array and read from it (in this case summing the numbers in it).下图显示了创建数组并从中读取每个元素所需的时间(在这种情况下,将其中的数字相加)。
capacity
and push
relate to the way the array was created: capacity
和push
与创建阵列的方式有关:
const arr = new Array(length)
and then arr[i] = data
const arr = new Array(length)
然后arr[i] = data
const arr = [];
const arr = [];
and then arr.push(data)
arr.push(data)
As you can see, in both cases, the creation of an array and reading from it, there is a performance reduction of about 2-3 times compared to the performance per element at 1k less elements.如您所见,在这两种情况下,创建一个数组并从中读取,与减少 1k 个元素的每个元素的性能相比,性能降低了大约 2-3 倍。
When creating an array using the push method, this jump happens a bit earlier compared to creating it with the correct capacity beforehand.使用 push 方法创建数组时,与预先使用正确容量创建数组相比,此跳转发生得更早一些。 I assume that this is happening because, when pushing to an array that is already at max capacity, more than the actually needed extra capacity is added (to avoid having to add new capacity again soon) and the threshold for the slower performance path is hit earlier.
我认为发生这种情况是因为,当推送到已经处于最大容量的阵列时,添加的额外容量超过了实际需要的额外容量(以避免很快再次添加新容量),并且达到了较慢性能路径的阈值早些时候。
If you want to see the code or test it yourself: github如果想看代码或者自己测试: github
To me it seems that larger arrays are treated differently in v8 starting at around 13k-16k elements to improve performance for them, but the cut-off point (at least in my code) is slightly too early so the performance drops before the optimizations bring any benefit.对我来说,在 v8 中似乎对较大的数组进行了不同的处理,从大约 13k-16k 元素开始以提高它们的性能,但是截止点(至少在我的代码中)有点太早了,所以在优化之前性能下降了任何好处。
You can see the performance improvements going down after about 500 elements and picking up again after the drop.您可以看到性能改进在大约 500 个元素后下降,并在下降后再次上升。
Sadly I can't find any information about that.可悲的是,我找不到任何有关此的信息。
Also, if you happen to have any idea why there are those spikes at the end of creating with capacity and summing with push, feel free to let me know :)另外,如果您碰巧知道为什么在使用容量创建和通过推送求和时会出现这些峰值,请随时告诉我:)
Edit:编辑:
As suggested by @ggorlen I run the same test on a different machine to rule out caching as the reason for the seen behavior (using a different, weaker CPU and less RAM).正如@ggorlen 所建议的那样,我在另一台机器上运行相同的测试以排除缓存作为所见行为的原因(使用不同的、较弱的 CPU 和更少的 RAM)。 The results seem very similar.
结果似乎非常相似。
Edit:编辑:
I ran node using the --allow-natives-syntax
flag to debug log the created arrays with %DebugPrint(array);
我使用
--allow-natives-syntax
标志运行节点来调试使用%DebugPrint(array);
, hoping to see a difference between the different array lengths, but besides the length and memory adresses they all look the same. ,希望看到不同数组长度之间的差异,但除了长度和内存地址之外,它们看起来都一样。 Here is one as an example:
这是一个例子:
// For array created with capacity DebugPrint: 000002CB8E1ACE19: [JSArray] - map: 0x035206283321 <Map(HOLEY_SMI_ELEMENTS)> [FastProperties] - prototype: 0x036b86245b19 <JSArray[0]> - elements: 0x02cb8e1ace39 <FixedArray[1]> [HOLEY_SMI_ELEMENTS] - length: 1 - properties: 0x0114c5d01309 <FixedArray[0]> - All own properties (excluding elements): { 00000114C5D04D41: [String] in ReadOnlySpace: #length: 0x03f907ac1189 <AccessorInfo> (const accessor descriptor), location: descriptor } 0000035206283321: [Map] - type: JS_ARRAY_TYPE - instance size: 32 - inobject properties: 0 - elements kind: HOLEY_SMI_ELEMENTS - unused property fields: 0 - enum length: invalid - back pointer: 0x035206283369 <Map(PACKED_SMI_ELEMENTS)> - prototype_validity cell: 0x03f907ac15e9 <Cell value= 1> - instance descriptors #1: 0x009994a6aa31 <DescriptorArray[1]> - transitions #1: 0x009994a6a9d1 <TransitionArray[4]>Transition array #1: 0x0114c5d05949 <Symbol: (elements_transition_symbol)>: (transition to PACKED_DOUBLE_ELEMENTS) -> 0x0352062832d9 <Map(PACKED_DOUBLE_ELEMENTS)> - prototype: 0x036b86245b19 <JSArray[0]> - constructor: 0x031474c124e9 <JSFunction Array (sfi = 000003CECD93C3A9)> - dependent code: 0x0114c5d01239 <Other heap object (WEAK_FIXED_ARRAY_TYPE)> - construction counter: 0 // For array created with push DebugPrint: 000003B09882CE19: [JSArray] - map: 0x02ff94f83369 <Map(PACKED_SMI_ELEMENTS)> [FastProperties] - prototype: 0x0329b3805b19 <JSArray[0]> - elements: 0x03b09882ce39 <FixedArray[17]> [PACKED_SMI_ELEMENTS] - length: 1 - properties: 0x03167aa81309 <FixedArray[0]> - All own properties (excluding elements): { 000003167AA84D41: [String] in ReadOnlySpace: #length: 0x02094f941189 <AccessorInfo> (const accessor descriptor), location: descriptor } 000002FF94F83369: [Map] - type: JS_ARRAY_TYPE - instance size: 32 - inobject properties: 0 - elements kind: PACKED_SMI_ELEMENTS - unused property fields: 0 - enum length: invalid - back pointer: 0x03167aa81599 <undefined> - prototype_validity cell: 0x02094f9415e9 <Cell value= 1> - instance descriptors #1: 0x00d25122aa31 <DescriptorArray[1]> - transitions #1: 0x00d25122aa01 <TransitionArray[4]>Transition array #1: 0x03167aa85949 <Symbol: (elements_transition_symbol)>: (transition to HOLEY_SMI_ELEMENTS) -> 0x02ff94f83321 <Map(HOLEY_SMI_ELEMENTS)> - prototype: 0x0329b3805b19 <JSArray[0]> - constructor: 0x009ff8a524e9 <JSFunction Array (sfi = 0000025A84ABC3A9)> - dependent code: 0x03167aa81239 <Other heap object (WEAK_FIXED_ARRAY_TYPE)> - construction counter: 0
Edit编辑
The performance drop for the summing happens when going from an array of size 13_994 to 13_995:从大小为 13_994 的数组变为 13_995 时,求和的性能下降:
(V8 developer here.) (这里是 V8 开发人员。)
There are two separate effects here:这里有两个单独的效果:
(1) What happens at 16384 elements is that the backing store is allocated in "large object space", a special region of the heap that's optimized for large objects. (1) 在 16384 个元素处发生的情况是后备存储被分配在“大对象空间”中,这是针对大对象优化的堆的特殊区域。 In Chrome, where pointer compression is enabled, this happens at exactly twice as many elements as in Node, where pointer compression is off.
在启用了指针压缩的 Chrome 中,发生这种情况的元素数量恰好是禁用指针压缩的 Node 中的两倍。 It has the consequence that the allocation itself can no longer happen as an inlined sequence of instructions directly in optimized code;
其结果是分配本身不能再作为内联指令序列直接在优化代码中发生; instead it's a call to a C++ function;
相反,它是对 C++ 函数的调用; which aside from having some call overhead is also a more generic implementation and as such a bit slower (there might be some optimization potential, not sure).
除了有一些调用开销之外,它还是一个更通用的实现,因此速度有点慢(可能有一些优化潜力,不确定)。 So it's not an optimization that kicks in too early;
所以这不是太早开始的优化。 it's just a cost that's paid by huge objects.
这只是巨大的物体所付出的代价。 And it'll only show up prominently in (tiny microbenchmark?) cases that allocate many large arrays and then don't do much with them.
并且它只会在分配许多大型数组然后对它们没有太多作用的情况下(微小的微基准测试?)突出显示。
(2) What happens at 13995 elements is that for a the specific function sum
, that's when the optimizing compiler kicks in to "OSR" (on-stack replace) the function, ie the function is replaced with optimized code while it is running. (2) 在 13995 个元素处发生的情况是对于特定函数
sum
,即优化编译器启动“OSR”(堆栈上替换)函数时,即函数在运行时被优化代码替换。 That's a certain one-off cost no matter when it happens, and it will pay for itself shortly after.无论何时发生,这都是一定的一次性成本,并且很快就会收回成本。 So perceiving this as a specific hit at a specific time is a typical microbenchmarking artifact that's irrelevant in real-world usage.
因此,将其视为特定时间的特定命中是典型的微基准测试工件,与实际使用无关。 (For instance, if you ran the test multiple times in the same process, you wouldn't see a step at 13995 any more. If you ran it with multiple sizes, then chances are OSR wouldn't be needed (as the function could switch to optimized code the next time it's called) and this one-off cost wouldn't happen at all.)
(例如,如果您在同一进程中多次运行测试,则不会再看到 13995 处的步骤。如果您以多种尺寸运行它,则可能不需要 OSR(因为该函数可以下次调用时切换到优化代码)并且这种一次性成本根本不会发生。)
TL;DR: Nothing to see here, just microbenchmarks producing confusing artifacts. TL;DR:这里没什么可看的,只是产生令人困惑的伪影的微基准。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.