简体   繁体   English

x86上的有符号和无符号算术实现

[英]signed and unsigned arithmetic implementation on x86

C language has signed and unsigned types like char and int. C语言有签名和无符号类型,如char和int。 I am not sure, how it is implemented on assembly level, for example it seems to me that multiplication of signed and unsigned would bring different results, so do assembly do both unsigned and signed arithmetic or only one and this is in some way emulated for the different case? 我不确定,它是如何在汇编级别实现的,例如在我看来,有符号和无符号的乘法会带来不同的结果,所以汇编既做无符号和有符号算术也只做一个,这在某种程度上是模拟的不同的情况?

If you look at the various multiplication instructions of x86, looking only at 32bit variants and ignoring BMI2, you will find these: 如果你看看x86的各种乘法指令,只查看32位变量并忽略BMI2,你会发现:

  • imul r/m32 (32x32->64 signed multiply) imul r/m32 (32x32-> 64签名乘法)
  • imul r32, r/m32 (32x32->32 multiply) * imul r32, r/m32 (32x32-> 32乘)*
  • imul r32, r/m32, imm (32x32->32 multiply) * imul r32, r/m32, imm (32x32-> 32乘法)*
  • mul r/m32 (32x32->64 unsigned multiply) mul r/m32 (32x32-> 64无符号乘法)

Notice that only the "widening" multiply has an unsigned counterpart. 请注意,只有“加宽”乘法具有无符号对应项。 The two forms in the middle, marked with an asterisk, are both signed and unsigned multiplication, because for the case where you don't get that extra "upper part", that's the same thing . 中间标有星号的两个表单都是有符号和无符号乘法,因为对于没有得到额外“上部”的情况, 这是同样的事情

The "widening" multiplications have no direct equivalent in C, but compilers can (and often do) use those forms anyway. “拓宽”乘法在C中没有直接等价,但编译器可以(并且经常)使用这些形式。

For example, if you compile this: 例如,如果你编译它:

uint32_t test(uint32_t a, uint32_t b)
{
    return a * b;
}

int32_t test(int32_t a, int32_t b)
{
    return a * b;
}

With GCC or some other relatively reasonable compiler, you'd get something like this: 使用GCC或其他一些相对合理的编译器,你会得到这样的东西:

test(unsigned int, unsigned int):
    mov eax, edi
    imul    eax, esi
    ret
test(int, int):
    mov eax, edi
    imul    eax, esi
    ret

(actual GCC output with -O1) (带-O1的实际GCC输出)


So signedness doesn't matter for multiplication (at least not for the kind of multiplication you use in C) and for some other operations, namely: 因此,对于乘法(至少不是在C中使用的乘法类型)和其他一些操作,签名无关紧要,即:

  • addition and subtraction 加减
  • bitwise AND, OR, XOR, NOT 按位AND,OR,XOR,NOT
  • negation 否定
  • left shift 左移
  • comparing for equality 比较平等

x86 doesn't offer separate signed/unsigned versions for those, because there's no difference anyway. x86不为那些提供单独的签名/未签名版本,因为无论如何都没有区别。

But for some operations there is a difference, for example: 但是对于某些操作存在差异,例如:

  • division ( idiv vs div ) 师( idiv vs div
  • remainder (also idiv vs div ) 余数(也是idiv vs div
  • right shift ( sar vs shr ) (but beware of signed right shift in C) 右移( sar vs shr )(但要注意C中签名的右移)
  • comparing for bigger than / smaller than 比较大于/小于

But that last one is special, x86 doesn't have separate versions for signed and unsigned of this either, instead it has one operation ( cmp , which is really just a nondestructive sub ) that does both at once, and gives several results (multiple bits in "the flags" are affected). 但是最后一个是特殊的,x86也没有单独的signed和unsigned版本,而是有一个操作( cmp ,它实际上只是一个非破坏性的sub )同时执行两个操作,并给出了几个结果(多个) “标志”中的位受影响)。 Later instructions that actually use those flags (branches, conditional moves, setcc ) then choose which flags they care about. 实际使用这些标志(分支,条件移动, setcc )的后续指令然后选择他们关心的标志。 So for example, 所以,例如,

cmp a, b
jg somewhere

Will go somewhere if a is "signed greater than" b . 如果a “签名大于” b将会去somewhere

cmp a, b
jb somewhere

Would go somewhere if a is "unsigned below" b . 会去somewhere ,如果a是“无符号下面的” b

See Assembly - JG/JNLE/JL/JNGE after CMP for more about the flags and branches. 有关标志和分支的更多信息,请参阅CMP后的汇编 - JG / JNLE / JL / JNGE


This won't be a formal proof that signed and unsigned multiplication are the same, I'll just try to give you insight into why they should be the same. 这不是签名和无符号乘法相同的正式证据,我只是试着让你深入了解为什么它们应该是相同的。

Consider 4-bit 2's-complement integers. 考虑4位2的补码整数。 The weights their individual bits are, from lsb to msb, 1, 2, 4, and -8. 各个位的权重,从lsb到msb,1,2,4和-8。 When you multiply two of those numbers, you can decompose one of them into 4 parts corresponding to its bits, for example: 当您将这些数字中的两个相乘时,您可以将其中一个数字分解为与其位相对应的4个部分,例如:

0011 (decompose this one to keep it interesting)
0010
---- *
0010 (from the bit with weight 1)
0100 (from the bit with weight 2, so shifted left 1)
---- +
0110

2 * 3 = 6 so everything checks out. 2 * 3 = 6所以一切都检查出来。 That's just regular long multiplication that most people learn in school, only binary, which makes it a lot easier since you don't have to multiply by a decimal digit, you only have to multiply by 0 or 1, and shift. 这只是大多数人在学校学习的常规长乘法,只有二进制,这使得它更容易,因为你不必乘以十进制数字,你只需要乘以0或1,然后移位。

Anyway, now take a negative number. 无论如何,现在采取负数。 The weight of the sign bit is -8, so at one point you will make a partial product -8 * something . 符号位的重量是-8,所以在某一点上你会得到一个部分乘积-8 * something A multiplication by 8 is shifting left by 3, so the former lsb is now the msb, and all other bits are 0. Now if you negate that (it was -8 after all, not 8), nothing happens. 乘以8是左移3,所以前lsb现在是msb,所有其他位都是0.现在如果你否定它(毕竟它是-8,而不是8),没有任何反应。 Zero is obviously unchanged, but so is 8, and in general the number with only the msb set: 零显然没有变化,但是8也是如此,并且通常只有msb设置的数字:

-1000 = ~1000 + 1 = 0111 + 1 = 1000

So you've done the same thing you would have done if the weight of the msb was 8 (as in the unsigned case) instead of -8. 因此,如果msb的权重为8(如无符号情况下)而不是-8,那么您已经完成了相同的操作。

Most of the modern processors support signed and unsigned arithmetic. 大多数现代处理器都支持有符号和无符号算术。 For those arithmetic which is not supported, we need to emulate the arithmetic. 对于那些不受支持的算术,我们需要模拟算术。

Quoting from this answer for X86 architecture 引用X86架构的这个答案

Firstly, x86 has native support for the two's complement representation of signed numbers. 首先,x86本身支持有符号数的二进制补码表示。 You can use other representations but this would require more instructions and generally be a waste of processor time. 您可以使用其他表示形式,但这需要更多指令,通常会浪费处理器时间。

What do I mean by "native support"? “原生支持”是什么意思? Basically I mean that there are a set of instructions you use for unsigned numbers and another set that you use for signed numbers. 基本上我的意思是你有一组用于无符号数字的指令和另一套用于有符号数字的指令。 Unsigned numbers can sit in the same registers as signed numbers, and indeed you can mix signed and unsigned instructions without worrying the processor. 无符号数字可以与有符号数字位于相同的寄存器中,实际上,您可以混合有符号和无符号指令,而无需担心处理器。 It's up to the compiler (or assembly programmer) to keep track of whether a number is signed or not, and use the appropriate instructions. 由编译器(或汇编程序员)来跟踪数字是否已签名,并使用适当的指令。

Firstly, two's complement numbers have the property that addition and subtraction is just the same as for unsigned numbers. 首先,二进制补码数具有加法和减法与无符号数相同的特性。 It makes no difference whether the numbers are positive or negative. 数字是正数还是负数没有区别。 (So you just go ahead and ADD and SUB your numbers without a worry.) (所以你只需要继续并添加你的号码而不必担心。)

The differences start to show when it comes to comparisons. 在进行比较时,差异开始显现。 x86 has a simple way of differentiating them: above/below indicates an unsigned comparison and greater/less than indicates a signed comparison. x86有一种区分它们的简单方法:上/下表示无符号比较,大于/小于表示已签名比较。 (Eg JAE means "Jump if above or equal" and is unsigned.) (例如JAE表示“如果大于或等于跳跃”且未签名。)

There are also two sets of multiplication and division instructions to deal with signed and unsigned integers. 还有两组乘法和除法指令来处理有符号和无符号整数。

Lastly: if you want to check for, say, overflow, you would do it differently for signed and for unsigned numbers. 最后:如果你想检查溢出,你会对签名和无符号数做不同的处理。

A little supplement for cmp and sub . cmpsub一点补充。 We know cmp is considered as non-destructive sub , so let's focus on sub . 我们知道cmp被认为是非破坏性的sub ,所以让我们关注sub

When a x86 cpu does a sub instruction, for example, 例如,当x86 cpu执行sub指令时

sub eax, ebx

How does the cpu know if either values of eax or ebx are signed or unsigned? 如果eax或ebx的值是有符号还是无符号,cpu如何知道? For example, consider a 4 bit width number in two's complement: 例如,考虑两位补码中的4位宽度数:

eax: 0b0001
ebx: 0b1111

In either signed or unsigned, value of eax will be interpreted as 1(dec) , which is fine. 无论是有符号还是无符号,eax的值都将被解释为1(dec) ,这很好。

However, if ebx is unsigned, it will be interpreted as 15(dec) , result becomes: 但是,如果ebx是无符号的,它将被解释为15(dec) ,结果变为:

ebx:15(dec) - eax: 1(dec) = 14(dec) = 0b1110 (two's complement)

If ebx is signed, then results becomes: 如果ebx已签名,则结果将变为:

ebx: -1(dec) - eax: 1(dec) = -2(dec) = 0b1110 (two's complement)

Even though for both signed or unsigned, the encode of their results in two's complement are same: 0b1110 . 即使对于有符号或无符号,它们在二进制补码中的结果编码也是相同的: 0b1110

But one is positive: 14(dec), the other is negative: -2(dec), then comes back our question: how does the cpu tell which to which? 但一个是正面的:14(dec),另一个是负面的:-2(dec),然后回到我们的问题:cpu如何判断哪个是哪个?

The answer is the cpu will evaluate both, from: http://x86.renejeschke.de/html/file_module_x86_id_308.html 答案是cpu将评估两者,来自: http//x86.renejeschke.de/html/file_module_x86_id_308.html

It evaluates the result for both signed and unsigned integer operands and sets the OF and CF flags to indicate an overflow in the signed or unsigned result, respectively. 它评估有符号和无符号整数操作数的结果,并设置OF和CF标志,分别表示有符号或无符号结果中的溢出。 The SF flag indicates the sign of the signed result. SF标志指示签名结果的符号。

For this specific example, when the cpu sees the result: 0b1110 , it will set the SF flag to 1 , because it's -2(dec) if 0b1110 is interpreted as a negative number. 对于这个具体的例子,当cpu看到结果: 0b1110 ,它会将SF标志设置为1 ,因为如果0b1110被解释为负数,则它是-2(dec)

Then it depends on the following instructions if they need to use the SF flag or simply ignore it. 如果他们需要使用SF标志或者只是忽略它,则它取决于以下指令。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM