简体繁体 English

x64 与 x86 Performance Considerations.Net

[英]x64 vs x86 Performance Considerations .Net

原文 2011-06-28 20:32:50 3 2 c#/ performance/ 64-bit

I am trying to understand what performance differences exist when running a native C# /.Net 4.0 app in x64 vs x86.我试图了解在 x64 与 x86 中运行本机 C# /.Net 4.0 应用程序时存在哪些性能差异。 I understand the memory considerations (x64 addressing all memory, x86 limited to 2/4gb), as well as the fact that an x64 app will use more memory (all pointers are 8 bytes instead of 4 bytes). I understand the memory considerations (x64 addressing all memory, x86 limited to 2/4gb), as well as the fact that an x64 app will use more memory (all pointers are 8 bytes instead of 4 bytes). As far as I can tell, none of these should affect any of the clock for clock instructions, as the x64 pipeline is wide enough to handle the wider instructions.据我所知，这些都不应该影响时钟指令的任何时钟，因为 x64 管道足够宽，可以处理更广泛的指令。

Is there a performance hit in context switching, due to the larger stack size for each thread?由于每个线程的堆栈大小较大，上下文切换是否会影响性能？ What performance considerations am I missing in evaluating the two?在评估两者时，我缺少哪些性能注意事项？

2 个解决方案

Joe White has given you some good reasons why your app might be slower. Joe White 为您提供了一些很好的理由说明您的应用程序可能会变慢。 Larger pointers (and therefore by extension larger references in .NET) will take up more space in memory, meaning less of your code and data will fit into the cache.在 memory 中，更大的指针（因此扩展而来的 .NET 中更大的引用）将占用更多空间，这意味着更少的代码和数据将适合缓存。

There are, however, plenty of beneficial reasons you might want to use x64:但是，您可能希望使用 x64 有很多有益的原因：

The AMD64 calling convention is used by default in x64 and can be quite a bit faster than the standard cdecl or stdcall, with many arguments being passed in registers and using the XMM registers for floating point.在 x64 中默认使用 AMD64 调用约定，并且可以比标准的 cdecl 或 stdcall 快很多，许多 arguments 在寄存器中传递并使用 XMM 寄存器进行浮点运算。
The CLR will emit scalar SSE instructions for dealing with floating point operations in 64-bit. CLR 将发出标量 SSE 指令以处理 64 位中的浮点运算。 In x86 it falls back on using the standard x87 FP stack, which is quite a bit slower, especially for things like converting between ints and floats.在 x86 中，它依赖于使用标准的 x87 FP 堆栈，这有点慢，特别是对于诸如在整数和浮点数之间转换的事情。
Having more registers means that there is much less chance that the JIT will have to spill them due to register pressure.拥有更多寄存器意味着 JIT 由于寄存器压力而不得不溢出它们的可能性要小得多。 Spilling registers can be quite costly for fast inner loops, especially if a function gets inlined and introduces additional register pressure there.对于快速内部循环，溢出寄存器的成本可能相当高，特别是如果 function 内联并在那里引入额外的寄存器压力。
Any operations on 64-bit integers can benefit tremendously by being able to fit into a single register instead of being broken up into two separate halves.对 64 位整数的任何操作都可以通过放入单个寄存器而不是分成两个单独的部分而受益匪浅。
This may be obvious, but the additional memory your process can access can be quite useful if your application is memory-intensive, even if it isn't hitting the theoretical limit.这可能很明显，但是如果您的应用程序是内存密集型的，即使它没有达到理论限制，您的进程可以访问的额外 memory 也会非常有用。 Fragmentation can cause you to hit "out of memory" conditions long before you reach that mark.碎片化可能会导致您在达到该标记之前很久就遇到“内存不足”的情况。
RIP-relative addressing in x64 can, in some cases, reduce the size of an executable image .在某些情况下，x64 中的 RIP 相对寻址可以减小可执行映像的大小。 Although that doesn't really apply directly to .NET apps, it can have an effect on the sharing of DLLs which may otherwise have to be relocated.虽然这并不真正直接适用于 .NET 应用程序，但它可能会影响 DLL 的共享，否则可能必须重新定位。 I'd be interested in knowing if anyone has any specific information on this with regards to .NET and managed applications.我很想知道是否有人对此有任何关于 .NET 和托管应用程序的具体信息。

Aside from these, the x64 version of the .NET runtime seems to, at least in the current versions, perform more optimizations than the x86 equivalent.除此之外，至少在当前版本中，.NET 运行时的 x64 版本似乎比 x86 等效版本执行了更多优化。 Things like inlining and memory alignment seem to happen much more often.内联和 memory alignment 之类的事情似乎发生得更频繁。 In fact, there was a bug a while back that prevented inlining of any method that took or returned a value type;事实上，不久前有一个错误阻止了任何采用或返回值类型的方法的内联。 I remember seeing it fixed in x64 and not the x86 version.我记得看到它在 x64 中修复，而不是 x86 版本。

Really, the only way you'll be able to tell which is better for your app will be to do profiling and testing on both architectures and comparing real results.确实，您能够判断哪个更适合您的应用程序的唯一方法是对两种架构进行分析和测试并比较实际结果。 However, I personally just use Any CPU wherever possible and avoid anything inherently architecture-dependent.但是，我个人只是尽可能使用 Any CPU，并避免任何本质上依赖于架构的东西。 This makes it easy to build and deploy, and is hopefully more future proof when the majority of users start switching to x64 exclusively.这使得构建和部署变得容易，并且当大多数用户开始专门切换到 x64 时，有望成为未来的证明。

Closely related to "x64 app will use more memory" is the fact that, with a 64-bit app, your locality of reference is smaller (because all your pointer sizes are doubled), so you get less mileage out of the CPU's on-board (ultra-fast) cache.与“x64 应用程序将使用更多内存”密切相关的事实是，对于 64 位应用程序，您的引用位置更小（因为您的所有指针大小都加倍），因此您从 CPU 上获得的里程更少 -板（超快）缓存。 You have to retrieve data from system RAM more often, which is much slower than the L2 and even the L1 on-chip cache.您必须更频繁地从系统 RAM 中检索数据，这比 L2 甚至 L1 片上缓存要慢得多。