简体   繁体   English

为什么在x86操作系统上运行16位程序会变慢?

[英]Why run 16-bit programs on x86 operating systems get slower?

I'm studying some things about assembly, and the material I'm reading, the author said that programs compiled for 16-bit rotate more slowly on x86 operating systems and the same goes for x64, 32bit compiled programs run slower on x64... 我正在研究有关汇编的一些知识以及正在阅读的材料,作者说,为16位编译的程序在x86操作系统上的旋转速度较慢,而对于x64来说,同样如此,而32位编译程序在x64上的运行速度较慢。 。

Why does this happen? 为什么会这样? What happens in the computer memory and the processor, so that programs 16bits or 32bits machines rotate more slowly in 32 bits and 64 bits, respectively ? 计算机内存和处理器中发生了什么,因此16位或32位程序的计算机分别以32位和64位的速度旋转得更慢?

About 16 bit programs running slower in 32 bits systems, I can tell you about that. 我可以告诉您有关在32位系统中运行速度较慢的16位程序的信息。 When Intel went from 16 bits to 32 bits, they had to expand the instruction set to cope with the new 32-bit registers, but maintaining binary compatibility with 16-bit programs. 当英特尔从16位增加到32位时,他们不得不扩展指令集以应对新的32位寄存器,但保持与16位程序的二进制兼容性。

To accomplish that, they added a prefix, 66h if I remember well, that when applied to any instruction that uses 16 bit registers, makes that instruction to use 32 bit registers. 为此,我记得很清楚,他们添加了一个前缀66h,该前缀应用于任何使用16位寄存器的指令时,都会使该指令使用32位寄存器。

For instance, a 16-bit instruction, like MOV AX,BX, prefixed with 66h, turns into MOV EAX,EBX 例如,一条以66h为前缀的16位指令(如MOV AX,BX)将变成MOV EAX,EBX

But this then imposes a penalty on the new 32 bit instructions, because they need at least an extra memory fetch cycle to get executed. 但这会给新的32位指令带来不利影响,因为它们至少需要一个额外的内存提取周期才能执行。 Intel then created the so called 32-bit segments and 16-bit segments. 然后,英特尔创建了所谓的32位段和16位段。

Basically, any piece of code must reside in a code segment. 基本上,任何一段代码都必须驻留在代码段中。 Before the 80386, all segments used 16-bit instructions, and all instructions are assumed to use 16 bit registers. 在80386之前,所有段都使用16位指令,并且假定所有指令都使用16位寄存器。

Intel's 32-segment contain code as well, but this time every instruction is assumed to use 32 bit registers, so in a 32-bit segment, the opcode of MOV EAX,EBX is the same as the opcode of MOV AX,BX in a 16-bit segment. Intel的32段也包含代码,但是这次假定每个指令都使用32位寄存器,因此在32位段中,MOV EAX,EBX的操作码与MOV AX,BX的操作码相同。 16位段。

This allows a program to not having to use the 66h prefix for every 32-bit instruction. 这使得程序不必为每个32位指令使用66h前缀。 There's no penalty anymore. 不再有罚款。

But... what if I have to use 16-bit registers within a program that is conained into a 32-bit segment? 但是...如果我必须在被分为32位段的程序中使用16位寄存器怎么办? Those instructions using 16-bit registers will have to use the prefix 66h. 这些使用16位寄存器的指令必须使用前缀66h。

So: instructions that use 16-bit registers are unprefixed in 16-bit segments and prefixed in 32-bit semgnts. 因此:使用16位寄存器的指令在16位段中没有前缀,在32位符号中带有前缀。 Instructions that use 32-bit registers are unprefixed in 32-bit segments and prefixed in 16-bit segments. 使用32位寄存器的指令在32位段中没有前缀,而在16位段中有前缀。

Besides: starting with the Pentium processor, we have two pipelines for executing instructions in parallel. 此外:从奔腾处理器开始,我们有两个并行执行指令的管道。 For these pipelines to be used, instructions entering them must belong to what Intel names "RISC nucleus": a subset of instructions that are no longer executed as a microprogram inside the CPU, but using wired logic. 对于要使用的这些管线,输入它们的指令必须属于Intel所称的“ RISC核”:指令的子集不再在CPU内部作为微程序执行,而是使用有线逻辑。 Guess what? 你猜怎么了? Prefixed instructions, and code executing in a 16-bit segment using 16-bit registers don't belong to this group and therefore, cannot execute in parallel with another one. 前缀指令和使用16位寄存器在16位段中执行的代码不属于该组,因此不能与另一组并行执行。 When a prefixed instruction manages to enter one of the pipelines, the other is stalled, thus affecting the perfomance of the CPU. 当带前缀的指令设法进入其中一个流水线时,另一个将停滞,从而影响CPU的性能。

About "programs rotate more slowly"... Well... programs don't "rotate", but "are executed". 关于“程序旋转得更慢” ...好吧,程序不会“旋转”,而是“执行”。 If you are talking about the bit rotation instruction... well. 如果您在谈论位旋转指令...好吧。 It happens that the 8086 has two versions of the bit rotating instruction: one that uses an inmediate argument that specifies the number of bits to rotate, and other one that uses a register (usually CX / ECX) to specify this. 碰巧8086有两种版本的位旋转指令:一种使用中间参数指定旋转位数,另一种使用寄存器(通常为CX / ECX)指定。

The thing is that 8086 processors don't allow any other value than 1 for the inmediate argument (but the value in CX/ECX can be greater than 1). 事实是,8086处理器的中间参数不允许使用任何非1的值(但是CX / ECX中的值可以大于1)。 80386 and higher processors allow using any other value as inmediate operand. 80386和更高版本的处理器允许使用任何其他值作为中间操作数。 Besides, 32-bit processors use only lower 5 bits of the operand that specifies the amount of rotating, so the operation don't exceed 31 (it's pointless to rotate a 32-bit reigster more than 31 times). 此外,32位处理器仅使用指定旋转量的操作数的低5位,因此该操作不会超过31(将32位reigster旋转31次以上是没有意义的)。 8086 processors don't impose this limit and therefore, spend more time in the operation. 8086处理器没有施加此限制,因此需要花费更多的时间进行操作。

I don't really know if this is what your book mean by "rotating more slowly". 我真的不知道这是您的书所说的“缓慢旋转”的意思。 I recall the rotating operation can only be performed in one of the pipelines, not both, so two consecutive rotating instructions can not be paired. 我记得旋转操作只能在一个管道中执行,不能同时在两个管道中执行,因此不能将两个连续的旋转指令配对。

I'm not sure what you mean by rotate (the assembly operations?), but in general there could be several factors here - 我不确定您所说的旋转(组装操作?)是什么意思,但通常这里可能有几个因素-

  1. CPU companies don't really go to the effort of supporting old legacy modes and ISA subsets. CPU公司并没有真正致力于支持旧的传统模式和ISA子集。 x87 is a good example, anything that doesn't really require that level of precision is better off using SSE/AVX for performance critical tasks, and not just because of vectorization. x87是一个很好的例子,实际上不需要任何精度水平的情况最好使用SSE / AVX来执行对性能至关重要的任务,而不仅仅是矢量化。

  2. Every time the x86 CPU companies increased their register sizes, they kept the old register set and just added logical names for the longer versions. 每次x86 CPU公司增加寄存器大小时,它们都会保留旧的寄存器集,而只是为较长的版本添加逻辑名。 The need for compatibility demanded that old operation can still work on the same registers, so you can now write to ah/al, ax, eax and rax in the same program. 对兼容性的需求要求旧的操作仍然可以在相同的寄存器上进行,因此您现在可以在同一程序中写入ah / al,ax,eax和rax。 In some of these cases (namely - the 8bit/16bit partials), this compatibility would require your CPU to keep the upper parts of the register intact when writing only to the lower part, doing this would introduce a merge operations implicitly, which may cause slowdowns. 在某些情况下(即8bit / 16bit局部函数),这种兼容性将要求您的CPU仅在写入下部时保持寄存器的上部完整,否则将隐式引入合并操作,这可能会导致减速。 Worse, you could introduce false dependencies as each write to the 16bit register would require you to merge in the upper part that remained from earlier operations. 更糟糕的是,您可能引入错误的依赖关系,因为每次对16位寄存器的写入都将要求您合并早期操作中剩下的上部。

See also here - Why do most x64 instructions zero the upper part of a 32 bit register 另请参见此处- 为什么大多数x64指令将32位寄存器的高位归零

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM