简体   繁体   中英

Why does sorting 32-bit numbers using JavaScript so much faster than sorting 33-bit numbers?

The following code simply creates an array and sort it. It is very strange that on my 2013 Macbook Pro, it took 5.8 seconds to sort the 30-bit numbers:

 n = 10000000; numMax = 1000000000; console.log(`numMax is how many bits: ${Math.ceil(Math.log(numMax) / Math.log(2))}`) console.log("\\n## Now creating array"); let start = Date.now(); let arr = new Array(n); for (let i = 0; i < arr.length; ++i) arr[i] = Math.floor(Math.random() * numMax); console.log(`took ${(Date.now() - start)/ 1000} seconds to create the array`); console.log("\\n## Now sorting it"); start = Date.now(); arr.sort((a, b) => a - b); console.log(`took ${(Date.now() - start)/ 1000} seconds to sort`);

But let's say we make it 34-bit numbers. Now it takes 12.7 seconds to run:

 n = 10000000; numMax = 10000000000; console.log(`numMax is how many bits: ${Math.ceil(Math.log(numMax) / Math.log(2))}`) console.log("\\n## Now creating array"); let start = Date.now(); let arr = new Array(n); for (let i = 0; i < arr.length; ++i) arr[i] = Math.floor(Math.random() * numMax); console.log(`took ${(Date.now() - start)/ 1000} seconds to create the array`); console.log("\\n## Now sorting it"); start = Date.now(); arr.sort((a, b) => a - b); console.log(`took ${(Date.now() - start)/ 1000} seconds to sort`);

On NodeJS (update: I am using v12.14.0), the difference is even more: 5.05 seconds vs 28.9 seconds. Why is the difference so big? If it is due to Chrome or NodeJS able to optimize it by using 32-bit integers, vs using 64-bit integers, or IEEE 754 Numbers, would it take exactly one clock cycle to do the compare during the sort (and moving the data during the "partition phase" of Quicksort)? Why would it take more than 2 times or even 5 times the time to do it? Does it also have something to do with fitting all data in the internal cache of the processor and whether the internal cache can support 32 bit but not IEEE 754 numbers?

V8 developer here. In short: this is why V8 uses "Smis" (small integers) internally when it can.

In JavaScript, any value can generally be anything, so engines typically represent values in some format that stores type information along with the value itself. This includes numbers; so a number on the heap is an object with two fields: a type descriptor, and the actual number value, which is an IEEE-754 64-bit double value per JavaScript spec. Since small-ish, integer-valued numbers are particularly common, V8 uses a special trick to encode them more efficiently: they're not stored as an object on the heap at all, instead the value is directly encoded into the "pointer", and one of the pointer's bits is used to tag it as a so-called Smi. In all current versions of Chrome, V8 uses 32-bit heap pointers, which leaves 31 bits for the payload of a Smi. Since arrays of numbers are also fairly common, storing a type descriptor for each element is fairly wasteful; instead V8 has double arrays, where the array itself remembers the fact (only once!) that all its elements are doubles; those elements can then be stored directly in the array.

So in the 30-bit version of your code, the array's backing store is an array full of Smis, and calling the comparator function can pass two of those directly. That function, in turn, can quickly Smi-check and untag the values to perform the subtraction.

In the 34-bit version, the array's backing store stores doubles. Every time the comparator needs to be called, two raw doubles are read from the array, are boxed as "heap numbers" in order to be used as parameters for the function call, and the comparator function has to read the value from the heap before being able to subtract them. I'm actually surprised that this is only about twice as slow :-)

To play with the performance details of this testcase a bit, you can force the array to store heap numbers instead of unboxed doubles. While that consumes more memory up front and has a performance cost for many use cases, in this particular case it actually saves about 10% of the time, since less short-lived garbage is allocated during execution. If you additionally force the comparator's result to be returned as a Smi:

arr.sort((a, b) => a > b ? 1 : a < b ? -1 : 0);

it saves another 10%.

On NodeJS, the difference is even more: 5.05 seconds vs 28.9 seconds.

With Node 13.11 I can't reproduce that; I'm getting almost exactly the same numbers as with Chrome 81.

Chrome or NodeJS able to optimize it by using 32-bit integers, vs using 64-bit integers, or IEEE 754 Numbers

Being able to use 32-bit integer CPU instructions is a side effect of using the Smi representation, but it's not the (primary) cause of the performance difference here. Using 64-bit integers internally would be a violation of the JavaScript spec (unless the engine were very careful to detect and avoid results that are too precise ).

would it take exactly one clock cycle to do the compare

Estimating clock cycles on modern CPUs is very difficult, and almost nothing is as simple as "exactly one clock cycle". On the one hand, CPUs can execute (parts of) more than one instruction per cycle, on the other hand they have pipelines, which means that finishing the execution of just one instruction takes many cycles. In particular, frequent branching (ie "decision-making" inside the CPU), as sorting algorithms typically have to do, tends to suffer from pipeline-related latencies.

the "partition phase" of Quicksort

V8 does not use Quicksort any more. That said, of course all sorting algorithms have to move data around.

Does it also have something to do with fitting all data in the internal cache of the processor and whether the internal cache can support 32 bit but not IEEE 754 numbers?

The CPU's cache does not care about the type of data. The size of the data (64-bit doubles are twice as big as 32-bit integers) can cause caching-related performance differences though.

V8 is capable of optimizing the numeric storage type if the optimizer can deduce that all the values in the array will fit in that size

Almost; there are no deductions involved: the array optimistically starts with a Smi backing store, and generalizes that as needed, eg when the first double is stored, the storage is switched over to a double array.

You are probably seeing the effect of "just-in-time compiling."

Not really. Of course all modern engines JIT-compile your code, but that's true for all code, and doesn't explain the difference here.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM