简体繁体 English

性能 32 位与 64 位算法

[英]Performance 32 bit vs. 64 bit arithmetic

原文 2012-01-20 22:57:59 2 3 c++/ c/ linux/ performance/ x86-64

Are native 64 bit integer arithmetic instructions slower than their 32 bit counter parts (on x86_64 machine with 64 bit OS)?本机64 bit整数算术指令是否比它们的32 bit计数器部分慢（在具有64 bit操作系统的x86_64机器上）？

Edit: On current CPUs such Intel Core2 Duo, i5/i7 etc.编辑：在当前的 CPU 上，例如 Intel Core2 Duo、i5/i7 等。

3 个解决方案

It depends on the exact CPU and operation.这取决于确切的 CPU 和操作。 On 64-bit Pentium IVs, for example, multiplication of 64-bit registers was quite a bit slower.例如，在 64 位 Pentium IV 上，64 位寄存器的乘法要慢很多。 Core 2 and later CPUs have been designed for 64-bit operation from the ground up. Core 2 和更高版本的 CPU 从头开始设计用于 64 位操作。

Generally, even code written for a 64-bit platform uses 32-bit variables where values will fit in them.通常，即使是为 64 位平台编写的代码也使用 32 位变量，其中的值适合它们。 This isn't primarily because arithmetic is faster (on modern CPUs, it generally isn't) but because it uses less memory and memory bandwidth.这主要不是因为算术速度更快（在现代 CPU 上，它通常不是），而是因为它使用更少的内存和内存带宽。

A structure containing a dozen integers will be half the size if those integers are 32-bit than if they are 64-bit.如果这些整数是 32 位的，那么包含十几个整数的结构的大小将是 64 位的一半。 This means it will take half as many bytes to store, half as much space in the cache, and so on.这意味着它需要一半的字节来存储，一半的空间在缓存中，等等。

64-bit native registers and arithmetic are used where values may not fit into 32-bits.在值可能不适合 32 位的情况下使用 64 位本机寄存器和算术。 But the main performance benefits come from the extra general purpose registers available in the x86_64 instruction set.但主要的性能优势来自 x86_64 指令集中可用的额外通用寄存器。 And of course, there are all the benefits that come from 64-bit pointers.当然，还有来自 64 位指针的所有好处。

So the real answer is that it doesn't matter.所以真正的答案是没关系。 Even if you use x86_64 mode, you can (and generally do) still use 32-bit arithmetic where it will do, and you get the benefits of larger pointers and more general purpose registers.即使您使用 x86_64 模式，您仍然可以（并且通常会）使用 32 位算术，并且您可以获得更大的指针和更多通用寄存器的好处。 When you use 64-bit native operations, it's because you need 64-bit operations, and you know they'll be faster than faking it with multiple 32-bit operations -- your only other choice.当您使用 64 位本机操作时，这是因为您需要 64 位操作，并且您知道它们会比使用多个 32 位操作伪造它更快——这是您唯一的其他选择。 So the relative performance of 32-bit versus 64-bit registers should never be a deciding factor in any implementation decision.因此，32 位与 64 位寄存器的相对性能永远不应成为任何实现决策的决定性因素。

I just stumbled upon this question, but I think one very important aspect is missing here: if you really look down into assembly code using the type 'int' for indices will likely slow down the code your compiler generates.我只是偶然发现了这个问题，但我认为这里缺少一个非常重要的方面：如果您真的仔细查看汇编代码，使用类型 'int' 作为索引可能会减慢编译器生成的代码。 This is because 'int' defaults to a 32bit type on many 64bit compilers and platforms (Visual Studio, GCC) and doing address calculations with pointers (which are necessarily 64bit on a 64bit OS) and 'int' will cause the compiler to emit unnecessary conversions between 32 and 64bit registers.这是因为“int”在许多 64 位编译器和平台（Visual Studio、GCC）上默认为 32 位类型，并且使用指针进行地址计算（在 64 位操作系统上必须是 64 位）和“int”将导致编译器发出不必要的信息32 位和 64 位寄存器之间的转换。 I've just experienced this in a very performance critical inner loop of my code.我刚刚在我的代码的一个非常关键的性能内部循环中体验到了这一点。 Switching from 'int' to 'long long' as loop index improved my algorithm run time by about 10%, which was quite a huge gain considering the extensive SSE/AVX2 vectorization I was already using at that point.从“int”切换到“long long”作为循环索引将我的算法运行时间提高了大约 10%，考虑到我当时已经在使用的广泛的 SSE/AVX2 矢量化，这是一个相当大的收益。

In a primarily 32-bit application (meaning only 32-bit arithmetic is used, and 32-bit pointers are sufficient), the real benefits of the x86-64 architecture are the other "updates" AMD made to the architecture:在主要 32 位应用程序中（意味着只使用 32 位算法，并且 32 位指针就足够了），x86-64 架构的真正好处是 AMD 对该架构进行的其他“更新”：