简体繁体 English

ARM64是否具有性能优势

[英]Is there performance advantage to ARM64

原文 2014-11-10 09:44:28 6 1 performance/ arm/ 64-bit/ arm64

Recently 64-bit ARM mobiles started appearing. 最近，64位ARM手机开始出现。 But is there any practical advantage to building an application 64-bit? 但是构建64位应用程序有什么实际优势吗？ Specifically considering application that does not have much use for the increased virtual address space¹, but would waste some space due to increased pointer size. 特别考虑对增加的虚拟地址空间¹没有多大用处的应用程序，但由于指针大小的增加会浪费一些空间。

So does ARM64 have any other advantages than the larger address that would actually warrant building such application 64bit? 那么ARM64除了实际上需要构建这样的64bit应用程序的更大地址之外还有其他任何优势吗？

Note: I've seen 64-bit Performance Advantages , but it only mentions x86-64 which does have other improvements besides extended virtual address space. 注意：我已经看到了64位性能优势，但它只提到了x86-64，除了扩展虚拟地址空间之外还有其他改进。 I also recall that the situation is indeed specific to x86 and on some other platforms that went 64-bit like Sparc the usual approach was to only compile kernel and the applications that actually did use lot of memory as 64-bit and everything else as 32-bit. 我还记得情况确实特定于x86和其他一些像Sparc一样64位的平台，通常的做法是只编译内核和实际使用大量内存作为64位的应用程序，其他一切都是32位。

¹ _{The application is multi-platform and it still needs to be built for and run on devices with as little as 48MiB of memory.} ¹ _{该应用程序是多平台的，它仍然需要在只有48MiB内存的设备上构建和运行。} _{Does have some large data that it reads from external storage, but it never needs more than some megabytes of it at once.} _{它有一些从外部存储读取的大数据，但它一次不需要超过几兆字节。}

1 个解决方案

I am not sure a general response can be given, but I can provide some examples of differences. 我不确定是否可以给出一般性回应，但我可以提供一些差异的例子。 There are of course additional differences added in version 8 of the ARM architecture, which apply regardless of target instruction set. 当然，在ARM体系结构的第8版中添加了其他差异，无论目标指令集如何都适用。

Performance-positive additions in AArch64 AArch64中的性能积极增加

32 General-purpose registers gives compilers more wiggle room. 32个通用寄存器为编译器提供了更多的摆动空间。
I/D cache synchronization mechanisms accessible from user mode (no system call needed). 可从用户模式访问的I / D缓存同步机制（无需系统调用）。
Load/Store-Pair instructions makes it possible to load 128-bits of data with one instruction, and still remain RISC-like. 加载/存储对指令使得可以用一条指令加载128位数据，并且仍然像RISC一样。
The removal of near-universal conditional execution makes more out-of-ordering possible. 删除近乎通用的条件执行会使更多的无序排序成为可能。
The change in layout of NEON registers (D0 is still lower half of Q0, but D1 is now lower half of Q1 rather than upper half of Q0) makes more out-of-ordering possible. NEON寄存器布局的变化（D0仍然是Q0的下半部分，但D1现在是Q1的下半部分而不是Q0的上半部分）使得更多的无序寄存器成为可能。
64-bit pointers make pointer tagging possible. 64位指针使指针标记成为可能。
CSEL enables all kind of crazy optimizations. CSEL支持各种疯狂的优化。

Performance-negative changes in AArch64 AArch64中的性能负面变化

More registers may also mean higher pressure on the stack. 更多寄存器也可能意味着堆栈上的压力更高。
Larger pointers mean larger memory footprint. 较大的指针意味着更大的内存占用。
Removal of near-universal conditional execution may cause higher pressure on branch predictor. 删除近乎通用的条件执行可能会对分支预测器造成更高的压力。
Removal of load/store-multiple means more instructions needed for function entry/exit. 删除加载/存储多个意味着函数进入/退出所需的更多指令。

Performance-relevant changes in ARMv8-A ARMv8-A中与性能相关的更改

Load-Aquire/Store-Release semantics remove need for explicit memory barriers for basic synchronization operations. Load-Aquire / Store-Release语义消除了对基本同步操作的显式内存屏障的需要。

I probably forgot lots of things, but those are some of the more obvious changes. 我可能忘了很多东西，但这些是一些更明显的变化。