简体   繁体   English

为什么英特尔没有提供其CPU寄存器的高阶部分?

[英]Why didn't Intel made the high order part of their CPUs' registers available?

When programming in assembly and doing some sort of string manipulation I use al , ah and sometimes others to hold characters because this allows me to keep more data in my registers. 当在汇编中进行编程并进行某种字符串操作时,我使用alah和其他人来保存字符,因为这允许我在寄存器中保留更多数据。 I think this is a very handy feature, but Intel's engineers seem don't agree with me, because they didn't make the two high order bytes of the registers accessible (or am I wrong?). 我认为这是一个非常方便的功能,但英特尔的工程师似乎不同意我,因为他们没有使寄存器的两个高位字节可访问(或者我错了?)。 I don't understand why. 我不明白为什么。 I thought about this for a while and my guesses are: 我想了一会儿,我的猜测是:

  1. They would make the CPU too complicated 它们会使CPU太复杂
  2. They would be useless 他们没用
  3. perhaps both of the above 也许两者都是上面的

I came up with number two because I've never seen a compiled program (say with gcc) use al or bh or any of them. 我想到了第二,因为我从未见过编译程序(比如使用gcc)使用albh或其中任何一个。

Although it's a little clumsy, you can just swap the halves of a register with rol reg,16 (or ror reg,16 , if you prefer). 虽然它有点笨拙,但你只需将寄存器的一半换成rol reg,16 (或ror reg,16 ,如果你愿意的话)。 On the Netbust CPUs (Pentium IV) that's quite inefficient, but on most newer (or older) CPUs you normally have a barrel shifter to do that in one clock. 在Netbust CPU(Pentium IV)上效率非常低,但在大多数较新(或较旧)的CPU上,你通常会有一个桶形移位器在一个时钟内完成。

As for why they didn't do it, it's pretty simple: they'd need to thoroughly redesign the instruction encoding if they really wanted to do it. 至于为什么他们没有这样做,这很简单:他们需要彻底重新设计指令编码,如果他们真的想这样做的话。 In the original design, they used up all the codes that would fit in the sizes of fields they used to specify a register. 在原始设计中,他们使用了所有符合其用于指定寄存器的字段大小的代码。 In fact, they already use something of a hack where the meaning of an encoding depends on the mode, and there are address size and operand size prefixes if you need to use a different size. 实际上,他们已经使用了一些hack,其中编码的含义取决于模式,如果需要使用不同的大小,则有地址大小和操作数大小前缀。 For example, to use AX when you're running in 32-bit mode, the instruction will have an operand override prefix before the instruction itself. 例如,要在32位模式下运行时使用AX,该指令将在指令本身之前具有操作数覆盖前缀。 If they'd really wanted to badly enough, they could have extended that concept to specify things like "the byte in bits 16-23 of register X", but it'd make decoding more complex, and decoding x86 instructions is already relatively painful. 如果他们真的想要足够严重,他们可以扩展这个概念来指定诸如“寄存器X的16-23位中的字节”之类的东西,但它会使解码变得更复杂,并且解码x86指令已经相对痛苦了。

Short answer is because of how it evolved from 16 bits. 简短的回答是因为它是如何从16位进化而来的。

Why is there not a register that contains the higher bytes of EAX? 为什么没有包含更高字节EAX的寄存器?

Beyond the instruction encoding issue that Jerry correctly mentions, there are other things at work here as well. 除了Jerry正确提到的指令编码问题之外,还有其他一些工作要做。

Most non-trivial CPUs are pipelined: this means that in ordinary operation, instructions begin executing before previous instructions have finished execution. 大多数非平凡的CPU都是流水线的:这意味着在普通操作中,指令在前一条指令完成执行之前开始执行。 This means that the processor must detect any dependencies of an instruction on earlier instructions and prevent the instruction from executing until the data (or condition flags) on which it depends are available[1]. 这意味着处理器必须检测先前指令上的指令的任何依赖性,并防止指令执行,直到它所依赖的数据(或条件标志)可用[1]。

Having names for different parts of a register complicates this dependency tracking. 为寄存器的不同部分设置名称会使这种依赖性跟踪变得复杂。 If I write: 如果我写:

mov  ax,  dx
add  eax, ecx

then the core needs to know that ax is part of eax , and that the add should wait until the result of the move is available. 然后核心需要知道axeax一部分,并且add应该等到移动的结果可用。 This is called a partial register update ; 这称为部分寄存器更新 ; although it seems very simple, hardware designers generally dislike them, and try to avoid needing to track them as much as possible (especially in modern out-of-order processors). 虽然看起来非常简单,但硬件设计人员通常不喜欢它们,并尽量避免需要尽可能地跟踪它们(特别是在现代无序处理器中)。

Having names for the high halves of the registers adds an additional set of partial register names that must be tracked, which adds die area and power usage, but delivers little benefit. 具有寄存器高半部分的名称会增加一组必须跟踪的部分寄存器名称,这会增加芯片面积和功耗,但几乎没有什么好处。 At the end of the day, this is how CPU design decisions are made: a tradeoff of die area (and power) vs. benefit. 在一天结束时,这就是CPU设计决策的方式:模具面积(和功率)与效益的权衡。

Partial register updates aren't the only thing that would be complicated by having names for the high parts of the register, but it's one of the simplest to explain; 通过为寄存器的高位部分命名,部分寄存器更新不是唯一复杂的事情,但它是最简单的解释之一; there are many other small things that would need to become more complicated in a modern x86 CPU to support it; 还有许多其他小东西需要在现代x86 CPU中变得更加复杂以支持它; considered in aggregate, the additional complexity would be substantial. 综合考虑,额外的复杂性将是巨大的。

[1] There are other ways to resolve dependencies, but we ignore them here for simplicity; [1]还有其他方法可以解决依赖关系,但为简单起见,我们在此忽略它们; they introduce similar problems. 他们引入类似的问题。

To add to what Jerry and Stephen have said so far. 加上Jerry和Stephen到目前为止所说的话。

First thoughts are you have to try to be conservative with your opcodes/instruction encoding. 首先想到的是你必须尝试保守你的操作码/指令编码。 Going in it started with ax, ah, and al. 进入它开始用ax,啊和al。 Is there a value added when going to eax to provide byte based access to that upper register (beyond the rotates or shifts that are already there to provide that)? 当转向eax以提供对上层寄存器的基于字节的访问时(除了已经提供的旋转或移位之外),是否有一个值? Not really. 并不是的。 If you are doing byte operations why are you using a 32 bit register and why using the upper bytes? 如果您正在进行字节操作,为什么使用32位寄存器以及为什么使用高位字节? Perhaps optimize the code differently taking advantage of what is available or tolerating what is available and taking advantage in other areas. 也许以不同的方式优化代码,利用可用的东西或容忍可用的东西,并在其他领域中占据优势。

I think there is a reason that the majority of the world's instruction sets do not have this four names for the same register thing. 我认为有一个原因是世界上大多数的指令集都没有这四个名字用于相同的寄存器。 And I dont think it is patents that are at play. 我不认为这是正在发挥作用的专利。 In its day it was probably a cool feature or design. 在它的一天,它可能是一个很酷的功能或设计。 Probably had its roots in transitioning folks from 8 bit processors into this 8/16 bit thing. 可能源于将8位处理器的人员转换为这种8/16位的东西。 Anyway, I think al, ah, ax, eax was bad design and everyone learned from that. 无论如何,我认为al,啊,ax,eax是糟糕的设计,每个人都从中学到了东西。 As Stephen mentioned you have hardware issues at play, if you were strictly to implement this in direct logic it is a mess, a rats nest of muxes to wire everything up (bad for speed and bad for power), then you get into the timing nightmare Stephen was taking about. 斯蒂芬提到你有硬件问题在起作用,如果你严格按照直接逻辑实现这一点,它就是一堆乱七八糟的混合物,用于连接所有东西(速度不好,功率不好),然后你进入时间噩梦斯蒂芬正在接受。 But there is a history of microcoding for this instruction set so you are essentially emulating these instructions with some other processor and in the same way it adds to that nightmare. 但是这个指令集有一个微编码的历史,所以你基本上用其他处理器模拟这些指令,并以同样的方式增加了这个噩梦。 The wise thing to do would have been to re-define ax to be 32 bit and get rid of ah and al. 明智的做法是重新定义ax为32位并摆脱啊和al。 Wise from a design perspective but unwise for portability (good for engineering, bad for marketing, sales, etc). 从设计角度看是明智的,但对于便携性来说是不明智的(对工程有利,对营销,销售等不利)。 I think the reason why that tired old instruction set is not limited to history books and museums is (among a few other reasons) because of reverse compatibility. 我认为,由于反向兼容性,疲惫的旧指令集不限于历史书籍和博物馆的原因(其中一些原因)。

I highly recommend learning a number of other instruction sets, both new and old. 我强烈建议学习一些新的和旧的指令集。 msp430, ARM, thumb, mips, 6502, z80, PIC (the old one that isnt a mips), etc. Just to name a few. msp430,ARM,拇指,mips,6502,z80,PIC(旧的不是mips)等等。仅举几例。 Seeing the differences and similarities between instruction sets is very educational IMO. 看到指令集之间的差异和相似性是非常有教育意义的IMO。 And depending on how deep you go into the understanding (variable word length vs fixed length, etc) understanding what choices we available to intel when making this 16 to 32 bit and more recently 32 bit to 64 bit transition, while trying to retain market share. 并且取决于你进入理解的深度(可变字长与固定长度等),了解在进行16到32位以及最近的32位到64位转换时我们可用于英特尔的哪些选择,同时试图保留市场份额。

I think the solution they chose at the time was the right choice, insert a formerly undefined opcode in front of what normally decodes as a 16 bit opcode turning it into a 32 bit opcode. 我认为他们当时选择的解决方案是正确的选择,在通常解码为16位操作码的前面插入一个以前未定义的操作码,将其转换为32位操作码。 Or sometimes not if there are no immediate values that follow (requiring the knowledge of how many to read). 或者有时不会,如果没有紧随其后的值(需要知道要阅读多少)。 It seemed in line with the instruction set at the time. 它似乎符合当时的指令集。 So it is back to Jerry's answer, the reason is a combination of the design of the 8/16 bit instruction set the history and reasons for expanding it. 所以它回到了Jerry的答案,原因是8/16位指令的设计组合了历史和扩展它的原因。 Granted they could have just as easily used similar encoding to provide access to the upper 16 bits in an ax,ah,al fashion, and they could have just as easily multiplied the four base registers A,B,C,D into 8 or 16 or 32 general purpose registers (A,B,C,D,E,F,G,H,...) while remaining reverse compatible. 当然,他们可以像使用类似的编码一样轻松地提供对ax中的高16位的访问,啊,方式,他们可以轻松地将四个基址寄存器A,B,C,D乘以8或16或32个通用寄存器(A,B,C,D,E,F,G,H,......),同时保持反向兼容。

In fact, traditional x86 opcodes allow both operand size selection (sometimes as specific instruction encoding, sometimes via prefix bytes) and register number selection bits. 实际上,传统的x86操作码允许选择操作数大小(有时作为特定指令编码,有时通过前缀字节)和寄存器号选择位。 For register selection, there's always three bits in the instruction encoding. 对于寄存器选择,指令编码中始终有三位。 This allows for a total of eight registers. 这允许总共八个寄存器。

Originally, there were four, AX/BX/BP/SP for 16bit and AL/AH/BL/BH for 8bit. 最初有四个,16位的AX / BX / BP / SP和8位的AL / AH / BL / BH。

Adding two more gave CX/DX plus CL/CH/DL/DH. 再加两个给CX / DX加CL / CH / DL / DH。 No more 8bit regs left, but still two unused values in the register selection for 16bit. 不再有8位寄存器,但16位寄存器选择中仍有两个未使用的值。

Which were provided in another rev of Intel's architecture by the index regs DI/SI. 这是由索引注册DI / SI在英特尔架构的另一个版本中提供的。

That done, they had exhausted the 3 register selection bits (and made it impossible to provide 8bit regs for SI/DI/BP/SP). 完成后,他们已经耗尽了3个寄存器选择位(并且无法为SI / DI / BP / SP提供8位寄存器)。

The way AMD64 64bit mode managed to double the register set is therefore by using prefix bytes ("use the new regs"-prefix), similar to how traditional x86 code chose between 16 and 32bit operations. 因此,AMD64 64位模式设法使寄存器集加倍的方式是使用前缀字节(“使用新的regs”-prefix),类似于传统的x86代码在16和32位操作之间选择的方式。 Same method was used to provide 8bit registers where there have been none "traditionally", ie for SP/BP/SI/DI . 使用相同的方法提供8位寄存器,其中没有“传统”,即用于SP/BP/SI/DI

To illustrate, see, for example, the following instruction encodings: 为了说明,请参阅以下指令编码:

0:     00 c0                add    %al,%al
2:     00 c1                add    %al,%cl
4:     00 c2                add    %al,%dl
6:     00 c3                add    %al,%bl
8:     00 c4                add    %al,%ah
a:     00 c5                add    %al,%ch
c:     00 c6                add    %al,%dh
e:     00 c7                add    %al,%bh
10: 40 00 c4                add    %al,%spl
13: 40 00 c5                add    %al,%bpl
16: 40 00 c6                add    %al,%sil
19: 40 00 c7                add    %al,%dil

And, for [ 16bit / 64bit ] / 32bit, side-by side since it's so illustrative: 并且,对于[16bit / 64bit] / 32bit,并排,因为它是如此说明:

0   : [66/48] 01 c0     add   %?ax,%?ax
2/3 : [66/48] 01 c1     add   %?ax,%?cx
4/6 : [66/48] 01 c2     add   %?ax,%?dx
6/9 : [66/48] 01 c3     add   %?ax,%?bx
8/c : [66/48] 01 c4     add   %?ax,%?sp
a/f : [66/48] 01 c5     add   %?ax,%?bp
c/12: [66/48] 01 c6     add   %?ax,%?si
e/15: [66/48] 01 c7     add   %?ax,%?di

The prefix 0x66 marks a 16bit operation, and 0x48 is one of the prefix bytes for a 64bit op (it'd be a different one if your target and/or source were one of the "new" high-numbered registers). 前缀0x66标记为16位操作, 0x48是64位操作的前缀字节之一(如果您的目标和/或源是“新”高编号寄存器之一,则它将是不同的)。

To get back to your original question, how to access the high bits; 要回到原来的问题,如何访问高位; well, newer CPUs have SSE instructions for the purpose; 好吧,较新的CPU有SSE指令用于此目的; every 8/16/32/64bit field of the vector register is separately accessible via eg shuffle instructions, and in fact a lot of string manipulation code provided by Intel / AMD in their optimized libraries these days doesn't use the normal CPU registers anymore but the vector registers instead. 向量寄存器的每个8/16/32/64位字段可以通过例如shuffle指令单独访问,事实上,Intel / AMD在其优化库中提供的大量字符串操作代码现在不再使用普通的CPU寄存器了但是矢量注册了。 If you need symmetry between upper / lower halves (or other fractions) of some larger value, use the vector registers. 如果需要在较大值的上半部分或下半部分(或其他分数)之间进行对称 ,请使用向量寄存器。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM