简体繁体 English

如果有的话，intel 和 amd 的 ISA 之间究竟有什么区别？

[英]What EXACTLY is the difference between intel's and amd's ISA, if any?

原文 2016-07-22 01:27:55 6 1 x86-64/ intel/ amd-processor/ instruction-set

I know people have asked similar questions like this before, however there is so much conflicting information that I really want to try and clear it up once and for all.我知道以前有人问过类似的问题，但是有太多相互矛盾的信息，我真的想尝试一劳永逸地解决它。 I will attempt to do so by clearly distinguishing between instruction set architecture (ISA) and actual hardware implementation.我将通过明确区分指令集架构 (ISA) 和实际硬件实现来尝试这样做。 First my attempted clarifications:首先我试图澄清：

1.) Currently there are intel64 and amd64 CPU's out there (among others but these are the focus) 1.) 目前有 intel64 和 amd64 CPU（其中包括但这些是重点）

2.) Given that an ISA is the binary representation of 1 or more CPU instructions this means an ISA is completely separate from it's actual hardware implementation. 2.) 鉴于 ISA 是 1 个或多个 CPU 指令的二进制表示，这意味着 ISA 与其实际的硬件实现完全分开。

My question(s):我的问题：

Does the differences between intel 64 and amd64 CPUs have to do with different or extended x86-64 ISAs? intel 64 和 amd64 CPU 之间的差异是否与不同或扩展的 x86-64 ISA 有关？ Or different hardware implementations of the x86-64 ISA?或者 x86-64 ISA 的不同硬件实现？ Or both?还是两者兼而有之？

1 个解决方案

Yes, the ISA is a document / specification, not hardware.是的，ISA 是一个文档/规范，而不是硬件。 Implementing all of it correctly is what makes something an x86 CPU, rather than just something with similarities to x86.正确实现所有这些是使某些东西成为 x86 CPU 的原因，而不仅仅是与 x86 相似的东西。

See the x86 tag wiki for links to the official docs (Intel's manuals).有关官方文档（英特尔手册）的链接，请参阅x86标签 wiki。

Intel and AMD's implementations of the x86 ISA differ mainly in performance, and in which extensions to the instruction-set they support. Intel 和 AMD 的x86 ISA 实现的不同主要在于性能以及它们支持的指令集扩展。 Software can query what's supported using the CPUID instruction.软件可以使用CPUID指令查询支持的内容。

There are also non-performance differences, like occasional minor differences in semantics of instructions, especially privileged instructions that OSes need to use:也存在非性能差异，例如指令语义的偶尔细微差异，尤其是操作系统需要使用的特权指令：

One of the major divergences here is that Intel, AMD, and VIA each have their own hardware-virtualization extensions which don't even try to be compatible.这里的主要分歧之一是英特尔、AMD 和威盛都有自己的硬件虚拟化扩展，它们甚至不尝试兼容。 So a VM like Xen needs separate "drivers" or "backend" code for each of these extensions.因此，像 Xen 这样的 VM 需要为这些扩展中的每一个单独的“驱动程序”或“后端”代码。 But those are still extensions, not part of baseline x86.但这些仍然是扩展，不是基线 x86 的一部分。

SIMD extensions for use by user-space programs end up being available on both, often with a delay thanks to Intel's efforts to screw over AMD with anti-competitive practices .供用户空间程序使用的 SIMD 扩展最终在两者上都可用，通常会延迟，这要归功于英特尔通过反竞争做法来颠覆 AMD 的努力。 This costs everyone else's time, and is often detrimental to the overall x86 ecosystem (eg SSSE3 could have been assumed as a baseline for more software by now), but helps Intel's bottom line.这会花费其他人的时间，并且通常不利于整个 x86 生态系统（例如，SSSE3 现在可以被假定为更多软件的基准），但有助于英特尔的底线。 A good example here: AMD Bulldozer supports FMA4, but Intel changed their mind at the last minute and implemented FMA3 in Haswell.一个很好的例子：AMD Bulldozer 支持 FMA4，但英特尔在最后一刻改变了主意，在 Haswell 中实现了 FMA3。 AMD didn't support that until their next microarch (Piledriver). AMD 直到他们的下一个微架构（Piledriver）才支持这一点。

Given that an ISA is the binary representation of 1 or more CPU instructions.鉴于 ISA 是 1 个或多个 CPU 指令的二进制表示。

No, an ISA is much more than that.不，ISA 远不止于此。 Everything that Intel documents as being guaranteed across all x86 CPUs is part of the ISA. Intel 记录为在所有 x86 CPU 上得到保证的所有内容都是 ISA 的一部分。 This isn't just the detailed behaviour of every instruction, but also stuff like which control register does what, and the memory ordering rules.这不仅仅是每条指令的详细行为，还包括哪个控制寄存器做什么以及内存排序规则之类的东西。 Basically everything in the manuals published by Intel and AMD that isn't prefaced by "on such and such a specific model of CPU".基本上，英特尔和 AMD 出版的手册中的所有内容都没有以“关于某某特定型号的 CPU”开头。

I expect there are a few cases where Intel's and AMD's system programming guides differ on how x86 should work.我预计在某些情况下，Intel 和 AMD 的系统编程指南在 x86 的工作方式上有所不同。 (And VIA's if they publish their own for their x86 CPUs). （如果他们为他们的 x86 CPU 发布他们自己的 VIA 的话）。 I haven't checked, but I'm pretty sure user-space doesn't suffer from this: If there are differences, they're limited to privileged instructions that only work if the kernel runs them.我没有检查过，但我很确定用户空间不会受此影响：如果存在差异，它们仅限于特权指令，只有在内核运行它们时才有效。 Anyway, in that case I guess you could say the x86 ISA is the common subset of what Intel and AMD document.无论如何，在那种情况下，我想您可以说 x86 ISA 是 Intel 和 AMD 文档的通用子集。

Note that experimenting to find what real hardware does in practice is useful for understanding the docs, but NOT a replacement for reading them.请注意，尝试找出实际硬件在实践中的作用对于理解文档很有用，但不能替代阅读它们。 You don't want your code to rely on how an instruction happens to behave on the CPU you tested.您不希望您的代码依赖于一条指令在您测试的 CPU 上的行为方式。

However, Intel does test their new designs with real software, because not being able to run existing versions of Windows would be a downside commercially.然而，英特尔确实使用真实软件测试了他们的新设计，因为无法运行现有版本的 Windows 将是商业上的一个缺点。 eg Windows9x doesn't invalidate a TLB entry that could only have been filled speculatively (all the rest of this example is just a summary of and extrapolation from that very detailed blog post).例如， Windows9x 不会使只能凭推测填充的 TLB 条目无效（本示例的其余部分只是对那篇非常详细的博客文章的总结和推断）。 This was either a performance hack based on the assumption that it was safe (and was safe on hardware at the time), or an unnoticed bug.这要么是基于它是安全的（并且当时在硬件上是安全的）假设的性能黑客，要么是一个未被注意到的错误。 It couldn't have been detected by testing on hardware at the time.当时无法通过硬件测试检测到它。

Modern Intel CPUs do speculative pagewalks, but even as recently as Haswell detect and shoot-down mis-speculation so code that assumes this doesn't happen will still work.现代英特尔 CPU 会进行推测性页面遍历，但即使在最近 Haswell 检测和击落错误推测时，假设这种情况不会发生的代码仍然可以工作。

This means the real hardware gives a stronger ordering guarantee than the ISA, which says:这意味着真正的硬件提供了比 ISA 更强的排序保证，它说：

The processor may cache translations required for prefetches and for accesses that are a result of speculative execution that would never actually occur in the executed code path.处理器可以缓存预取和作为推测执行结果的访问所需的转换，这些预测执行在执行的代码路径中永远不会实际发生。

Still, depending on this stronger behaviour would be a mistake, unless you only do it on known microarchitectures.尽管如此，依赖这种更强的行为将是一个错误，除非您只在已知的微体系结构上这样做。 AMD K8/K10 is like Intel, but Bulldozer-family speculates without any detect+rollback mechanism to give coherence, so that Win9x kernel code isn't safe on that hardware. AMD K8/K10 类似于 Intel，但推土机系列推测没有任何检测+回滚机制来提供一致性，因此 Win9x 内核代码在该硬件上是不安全的。 And future Intel hardware might drop the detect+rollback mechanism, too.未来的英特尔硬件也可能会放弃检测+回滚机制。

TL:DR: all the uarches implement what the x86 ISA says, but some give stronger guarantees. TL:DR：所有 uarches 都实现了 x86 ISA 所说的内容，但有些提供了更强的保证。 If you're as big as Microsoft, Intel and AMD will design CPUs that reproduce the non-ISA-guaranteed behaviour that your code depends on.如果您和 Microsoft 一样大，Intel 和 AMD 将设计 CPU 来重现您的代码所依赖的非 ISA 保证行为。 At least until that software is long-obsolete.至少在该软件长期过时之前。 There's no true guarantee that future Intel uarches will keep the rollback mechanism.无法真正保证未来的英特尔 uarch 将保留回滚机制。 If Intel ever does another redesign from the ground up, (like P4 / NetBurst instead of just building on their existing Sandybridge uarch family) that would when they could plausibly change something.如果英特尔从头开始进行另一次重新设计（比如 P4 / NetBurst，而不是仅仅建立在他们现有的 Sandybridge uarch 系列上），那么他们可能会改变一些东西。

A different example: the bsf instruction with an input of zero leaves its output undefined, according to the paper spec in Intel's insn ref manual .一个不同的例子：根据英特尔 insn ref 手册中的论文规范，输入为零的bsf指令使其输出未定义。

But for any specific CPU, there will be some pattern of behaviour, like setting the output to zero, or leaving it unchanged.但是对于任何特定的 CPU，都会有一些行为模式，例如将输出设置为零，或保持不变。 On paper, it would be valid for an out-of-order-execution CPU to really give unpredictable results that were different for the same inputs, because of different microarchitectural state.在纸面上，由于不同的微体系结构状态，乱序执行 CPU 真正给出对于相同输入不同的不可预测的结果是有效的。

But the behaviour Intel chooses to implement in silicon is to always leave the destination unchanged when the bsf or bsr input is zero .但是英特尔选择在芯片中实现的行为是在bsf或bsr输入为零时始终保持目的地不变。 AMD does the same, and even documents the behaviour. AMD 也这样做，甚至记录了行为。 It's basically an unofficial guarantee that mov eax,32 / bsf eax, ebx will work exactly like tzcnt (except for flag setting, eg ZF based on the input being 0, rather than the output).它基本上是mov eax,32 / bsf eax, ebx将完全像tzcnt一样工作的非官方保证（标志设置除外，例如基于输入为 0 而不是输出的 ZF）。

This is why popcnt / lzcnt / tzcnt have a false dependency on the output register in Intel CPUs!这就是popcnt / lzcnt / tzcnt对 Intel CPU 中的输出寄存器有错误依赖的原因！ . .

It's common for CPU vendors to go above and beyond the paper ISA spec to avoid breaking some existing code that depends on this behaviour (eg if that code is part of Windows, or other major pieces of software that Intel / AMD test on their new CPU designs). CPU 供应商通常会超越纸质 ISA 规范，以避免破坏某些依赖于这种行为的现有代码（例如，如果该代码是 Windows 的一部分，或者英特尔 / AMD 在其新 CPU 上测试的其他主要软件部分）设计）。

As Andy Glew said in a comment thread about the coherent page walk thing mentioned above, and about self-modifying code: 正如 Andy Glew 在有关上述连贯页面遍历以及自修改代码的评论线程中所说的那样：

It is pretty common that a particular implementation has to implement rules compatible with but stronger than the architectural statement.一个特定的实现必须实现与架构声明兼容但比架构声明更强的规则，这是很常见的。 But not all implementations have to do it the same way.但并非所有实现都必须以相同的方式执行此操作。