简体   繁体   English

为什么此位黑客代码可移植?

[英]Why is this bit-hack code portable?

int v;
int sign; // the sign of v ;
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));

Q1: Since v in defined by type of int ,so why bother to cast it into int again? 问题1:既然v in由int类型定义,那么为什么还要麻烦再次将其转换为int呢? Is it related to portability? 它与可移植性有关吗?

Edit: 编辑:

Q2: Q2:

sign = v >> (sizeof(int) * CHAR_BIT - 1); 

this snippt isn't portable, since right shift of signed int is implementation defined , how to pad the left margin bits is up to complier.So 这snippt 是不可移植的,因为右移 signed int实现定义 ,如何填充左边距位达complier.So

 -(int)((unsigned int)((int)v) 

do the poratable trick. 做一个可行的把戏。 Explain me why thid works please. 请告诉我为什么这一切行得通。 Isn't right shift of unsigned int alway padding 0 in the left margin bits ? 是不是在左边距位中总是右移 unsigned int总是填充0

It's not strictly portable, since it is theoretically possible that int and/or unsigned int have padding bits. 它不是严格可移植的,因为从理论上说int和/或unsigned int可能具有填充位。

In a hypothetical implementation where unsigned int has padding bits, shifting right by sizeof(int)*CHAR_BIT - 1 would produce undefined behaviour since then unsigned int具有填充位的假设实现中,向右移动sizeof(int)*CHAR_BIT - 1将产生未定义的行为,此后

sizeof(int)*CHAR_BIT - 1 >= WIDTH

But for all implementations where unsigned int has no padding bits - and as far as I know that means all existing implementations - the code 但是对于unsigned int没有填充位的所有实现-据我所知,这意味着所有现有实现-代码

int v;
int sign; // the sign of v ;
sign = -(int)((unsigned int)((int)v) >> (sizeof(int) * CHAR_BIT - 1));

must set sign to -1 if v < 0 and to 0 if v >= 0 . 如果v < 0 ,则必须将sign设置为-1 ;如果v >= 0则必须将sign设置为v >= 0 (Note - thanks to Sander De Dycker for pointing it out - that if int has a negative zero, that would also produce sign = 0 , since -0 == 0 . If the implementation supports negative zeros and the sign for a negative zero should be -1 , neither this shifting, nor the comparison v < 0 would produce that, a direct inspection of the object representation would be required.) (请注意-感谢Sander De Dycker指出了这一点-如果int具有负零,由于-0 == 0 ,所以也会产生sign = 0 -0 == 0 。如果实现支持负零,则负零的正负号应为-1 ,这种移动或比较v < 0都不会产生,因此需要直接检查对象表示。)

The cast to int before the cast to unsigned int before the shift is entirely superfluous and does nothing. 演员要int投之前unsigned int前的转变完全是多余的和不执行任何操作。

It is - disregarding the hypothetical padding bits problem - portable because the conversion to unsigned integer types and the representation of unsigned integer types is prescribed by the standard. 这是-忽略了假设的填充位问题-可移植,因为标准规定了对无符号整数类型的转换和无符号整数类型的表示。

Conversion to an unsigned integer type is reduction modulo 2^WIDTH , where WIDTH is the number of value bits in the type, so that the result lies in the range 0 to 2^WIDTH - 1 inclusive. 转换为无符号整数类型是归约模2^WIDTH ,其中WIDTH是该类型中值位数,因此结果位于0到2^WIDTH - 1含)范围内。

Since without padding bits in unsigned int the size of the range of int cannot be larger than that of unsigned int , and the standard mandates (6.2.6.2) that signed integers are represented in one of 因为没有在填充比特unsigned int的范围的大小int不能比的大unsigned int ,而标准的任务(6.2.6.2),该符号整数在中的一个表示

  • sign and magnitude 符号和大小
  • ones' complement 人的补语
  • two's complement 补码

the smallest possible representable int value is -2^(WIDTH-1) . 可能的最小可表示int值是-2^(WIDTH-1) So a negative int of value -k is converted to 2^WIDTH - k >= 2^(WIDTH-1) and thus has the most significant bit set. 因此,值-k的负int将转换为2^WIDTH - k >= 2^(WIDTH-1) ,因此具有最高有效位集。

A non-negative int value, on the other hand cannot be larger than 2^(WIDTH-1) - 1 and hence its value will be preserved by the conversion and the most significant bit will not be set. 另一方面,非负int值不能大于2^(WIDTH-1) - 1 ,因此,其值将通过转换保留,并且最高有效位将不会被设置。

So when the result of the conversion is shifted by WIDTH - 1 bits to the right (again, we assume no padding bits in unsigned int , hence WIDTH == sizeof(int)*CHAR_BIT ), it will produce a 0 if the int value was non-negative, and a 1 if it was negative. 因此,当转换结果向右移动WIDTH - 1位(再次,我们假设unsigned int没有填充位,因此WIDTH == sizeof(int)*CHAR_BIT )时,如果int值将产生0为非负数,如果为负数则为1

Nope its just excessive casting. 不,它只是过度的铸造。 There is no need to cast it to an int. 无需将其强制转换为int。 It doesn't hurt however. 但是,它没有伤害。

Edit: Its worth noting that it may be done like that so the type of v can be changed to something else or it may have once been another data type and after it was converted to an int the cast was never removed. 编辑:值得注意的是,可以这样进行操作,以便可以将v的类型更改为其他类型,或者它曾经是另一种数据类型,并且在将其转换为int类型之后,从未删除过强制转换。

It should be quite portable because when you convert int to unsigned int (via a cast), you receive a value that is 2's complement bit representation of the value of the original int , with the most significant bit being the sign bit. 它应该具有很大的可移植性,因为当您将int转换为unsigned int (通过unsigned int转换)时,您会收到一个值,该值是原始int值的2的补码表示形式,其中最高有效位是符号位。

UPDATE : A more detailed explanation... 更新 :更详细的说明...

I'm assuming there are no padding bits in int and unsigned int and all bits in the two types are utilized to represent integer values. 我假设intunsigned int中没有填充位,并且两种类型中的所有位都用于表示整数值。 It's a reasonable assumption for the modern hardware. 对于现代硬件,这是一个合理的假设。 Padding bits are a thing of the past, from where we're still carrying them around in the current and recent C standards for the purpose of backward compatibility (ie to be able to run code on old machines). 填充位已成为过去,从现在起,我们仍将它们填充在当前和最新的C标准中,以实现向后兼容(即,能够在旧计算机上运行代码)。

With that assumption, if int and unsigned int have N bits in them ( N = CHAR_BIT * sizeof(int) ), then per the C standard we have 3 options to represent int , which is a signed type: 在此假设下,如果intunsigned int包含N位( N = CHAR_BIT * sizeof(int) ),则根据C标准,我们有3个选项来表示int ,这是一个有符号的类型:

  1. sign-and-magnitude representation, allowing values from -(2 N-1 -1) to 2 N-1 -1 符号和幅度表示,允许从-(2 N-1 -1)到2 N-1 -1的值
  2. one's complement representation, also allowing values from -(2 N-1 -1) to 2 N-1 -1 一个人的补码表示法,也允许从-(2 N-1 -1)到2 N-1 -1的值
  3. two's complement representation, allowing values from -2 N-1 to 2 N-1 -1 or, possibly, from -(2 N-1 -1) to 2 N-1 -1 二进制补码表示形式,允许值从-2 N-1到2 N-1 -1,或者可能从-(2 N-1 -1)到2 N-1 -1

The sign-and-magnitude and one's complement representations are also a thing of the past, but let's not throw them out just yet. 符号和幅值以及补码表示形式也已成为过去,但是我们暂时不要将其丢弃。

When we convert int to unsigned int , the rule is that a non-negative value v (>=0) doesn't change, while a negative value v (<0) changes to the positive value of 2 N + v , hence (unsigned int)-1 = UINT_MAX . 当我们将int转换为unsigned int ,规则是非负值v (> = 0)不变,而负值v (<0)变为2 N + v的正值,因此(unsigned int)-1 = UINT_MAX

Therefore, (unsigned int)v for a non-negative v will always be in the range from 0 to 2 N-1 -1 and the most significant bit of (unsigned int)v will be 0. 因此,非负v (unsigned int)v将始终在0到2 N-1 -1的范围内,并且(unsigned int)v的最高有效位将为0。

Now, for a negative v in the range from to -2 N-1 to -1 (this range is a superset of the negative ranges for the three possible representations of int ), (unsigned int)v will be in the range from 2 N +(-2 N-1 ) to 2 N +(-1), simplifying which we arrive at the range from 2 N-1 to 2 N -1. 现在,对于负v范围从-2到N-1为-1(该范围是负范围为三个可能的表示一个超集int ), (unsigned int)v将是在从2的范围内N +(-2 N-1 )到2 N +(-1),简化了我们得出的范围是2 N-1到2 N -1。 Clearly, the most significant bit of this value will always be 1. 显然,此值的最高有效位将始终为1。

If you look carefully at all this math, you will see that the value of (unsigned)v looks exactly the same in binary as v in 2's complement representation: 如果仔细看所有这些数学运算,您将发现(unsigned)v的值在二进制中看起来与v在2的补码表示形式中完全相同:

... ...
v = -2: (unsigned)v = 2 N - 2 = 111...110 2 v = -2:( (unsigned)v = 2 N -2 = 111 ... 110 2
v = -1: (unsigned)v = 2 N - 1 = 111...111 2 v = -1:( (unsigned)v = 2 N -1 = 111 ... 111 2
v = 0: (unsigned)v = 0 = 000...000 2 v = 0:( (unsigned)v = 0 = 000 ... 000 2
v = 1: (unsigned)v = 1 = 000...001 2 v = 1 :( (unsigned)v = 1 = 000 ... 001 2
... ...

So, there, the most significant bit of the value (unsigned)v is going to be 0 for v >=0 and 1 for v <0. 因此,在那里, (unsigned)v的最高有效位(unsigned)v将在v > = 0时为0,在v <0时为1。

Now, let's get back to the sign-and-magnitude and one's complement representations. 现在,让我们回到符号和大小以及补码表示法。 These two representations may allow two zeroes, a +0 and a -0 . 这两个表示可以允许两个零,即+0和a -0 But arithmetic computations do not visibly distinguish between +0 and -0 , it's still a 0 , whether you add it, subtract it, multiply it or compare it. 但是,算术运算并没有明显地区分+0-0 ,它仍然是0 ,无论是加,减,乘还是比较。 You, as an observer, normally wouldn't see +0 or -0 or any difference from having one or the other. 作为观察者,您通常不会看到+0-0或拥有一个或另一个的任何区别。

Trying to observe and distinguish +0 and -0 is generally pointless and you should not normally expect or rely on the presence of two zeroes if you want to make your code portable. 尝试观察和区分+0-0通常是没有意义的,并且如果要使代码可移植,通常不应期望或依靠两个零的存在。

(unsigned int)v won't tell you the difference between v=+0 and v=-0 , in both cases (unsigned int)v will be equivalent to 0u . (unsigned int)v不会告诉您v=+0v=-0之间的差异,在两种情况下(unsigned int)v都等于0u

So, with this method you won't be able to tell whether internally v is a -0 or a +0 , you won't extract v's sign bit this way for v=-0 . 因此,使用这种方法,您将无法确定v在内部是-0还是+0 ,对于v=-0 ,您将不会以这种方式提取v的符号位。

But again, you gain nothing of practical value from differentiating between the two zeroes and you don't want this differentiation in portable code. 但是,再次,您不能从两个零之间进行区分而获得任何实际价值,并且您不想在可移植代码中进行这种区分。

So, with this I dare to declare the method for sign extraction presented in the question quite/very/pretty-much/etc portable in practice. 因此,在此我敢于在实践中非常/非常/非常/非常/可移植地声明问题中提出的符号提取方法。

This method is an overkill, though. 但是,这种方法是过大的。 And (int)v in the original code is unnecessary as v is already an int . 原始代码中的(int)v是不必要的,因为v已经是一个int

This should be more than enough and easy to comprehend: 这应该绰绰有余,并且易于理解:

int sign = -(v < 0);

It isn't. 不是。 The Standard does not define the representation of integers, and therefore it's impossible to guarantee exactly what the result of that will be portably. 该标准未定义整数的表示形式,因此无法保证确切的结果是可移植的。 The only way to get the sign of an integer is to do a comparison. 获得整数符号的唯一方法是进行比较。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM