简体   繁体   English

意外的C / C ++按位移位运算符结果

[英]Unexpected C/C++ bitwise shift operators outcome

I think I'm going insane with this. 我想我会疯了。

I have aa piece of code that needs to create an (unsigned) integer with N consequent bits set to 1. To be exact I have a bitmask, and in some situations I'd like to set it to a solid rnage. 我有一段代码需要创建一个(无符号)整数,其中N后续位设置为1.确切地说,我有一个位掩码,在某些情况下,我想将它设置为一个实心的rnage。

I have the following function: 我有以下功能:

void MaskAddRange(UINT& mask, UINT first, UINT count)
{
    mask |= ((1 << count) - 1) << first;
}

In simple words: 1 << count in binary representation is 100...000 (number of zeroes is count ), subtracting 1 from such a number gives 011...111 , and then we just left-shift it by first . 简单来说: 1 << count二进制表示中的1 << count100...000 (零的countcount ),从这样的数字中减去1得到011...111 ,然后我们first它左移。

The above should yield correct result, when the following obvious limitation is met: 当满足以下明显限制时,上述结果应产生正确的结果:

first + count <= sizeof(UINT)*8 = 32

Note that it should also work correctly for "extreme" cases. 请注意 ,它也应该适用于“极端”情况。

  • if count = 0 we have (1 << count) = 1 , and hence ((1 << count) - 1) = 0 . 如果count = 0我们有(1 << count) = 1 ,因此((1 << count) - 1) = 0
  • if count = 32 we have (1 << count) = 0 , since the leading bit overflows, and according to C/C++ rules bitwise shift operators are not cyclic . 如果count = 32我们有(1 << count) = 0 ,因为前导位溢出,并且根据C / C ++规则,按位移位运算符不是循环的 Then ((1 << count) - 1) = -1 (all bits set). 然后((1 << count) - 1) = -1 (所有位都设置)。

However, as turned out, for count = 32 the formula doesn't work as expected. 然而,事实证明,对于count = 32 ,公式不能按预期工作。 As discovered: 如发现:

UINT n = 32;
UINT x = 1 << n;
// the value of x is 1

Moreover, I'm using MSVC2005 IDE. 而且,我正在使用MSVC2005 IDE。 When I evaluate the above expression in the debugger, the result is 0. However when I step over the above line, x gets value of 1. Lokking via the disassembler we see the following: 当我在调试器中评估上面的表达式时,结果是0.但是当我跳过上面的行时, x得到值1.通过反汇编程序看到以下内容:

mov eax,1 
mov ecx,dword ptr [ebp-0Ch] // ecx = n
shl eax,cl                  // eax <<= LOBYTE(ecx)
mov dword ptr [ebp-18h],eax // n = ecx

There's no magic indeed, compiler just used shl instruction. 确实没有魔法,编译器只使用了shl指令。 Then it seems that shl doesn't do what I expected it should do. 然后似乎shl没有做我预期应该做的事情。 Either CPU decides to ignore this instruction, or the shift is treated modulo 32, or donno what. CPU决定忽略此指令,或者将移位视为模32,或者不知道什么。

My questions are: 我的问题是:

  • What is the correct behavior of shl / shr instructions? shl / shr指令的正确行为是什么?
  • Is there a CPU flag controlling the bitshift instructions? 是否有CPU标志控制位移指令?
  • Is this according to C/C++ standard? 这是根据C / C ++标准吗?

Thanks in advance 提前致谢

Edit: 编辑:

Thanks for answers. 谢谢你的回答。 I've realized that (1) shl / shr indeed treat operand modulo 32 (or & 0x1F) and (2) C/C++ standard treats shift by more than 31 bits as undefined behavior. 我已经意识到(1) shl / shr确实处理操作数模32(或&0x1F)和(2)C / C ++标准将移位超过31位视为未定义的行为。

Then I have one more question. 然后我还有一个问题。 How can I rewrite my "masking" expression to cover this extreme case too. 我怎样才能重写我的“掩蔽”表达来覆盖这种极端情况。 It should be without branching ( if , ? ). 它应该没有分支( if? )。 What'd be the simplest expression? 什么是最简单的表达方式?

1U << 32 is undefined behavior in C and in C++ when type unsigned int is 32-bit wide. 1U << 32是C和C ++中未定义的行为,当unsigned int类型为32位宽时。

(C11, 6.5.7p3) "If the value of the right operand is negative or is greater than or equal to the width of the promoted left operand, the behavior is undefined" (C11,6.5.7p3)“如果右操作数的值为负或大于或等于提升的左操作数的宽度,则行为未定义”

(C++11, 5.8p1) "The behavior is undefined if the right operand is negative, or greater than or equal to the length in bits of the promoted left operand." (C ++ 11,5.8p1)“如果右操作数为负数,或大于或等于提升左操作数的位长度,则行为未定义。”

Shifting by as many or more bits than in the integer type you're shifting is undefined in C and C++. 在C和C ++中, 移动的位数与在移位的整数类型中的位数相同。 On x86 and x86_64, the shift amount of the shift instructions is indeed treated modulo 32 (or whatever the operand size is). 在x86和x86_64上,移位指令的移位量确实以模32(或操作数大小为单位)处理。 You however cannot rely on this modulo behaviour to be generated by your compiler from C or C++ >> / << operations unless your compiler explicitly guarantees it in its documentation. 但是, 您不能依赖编译器从C或C ++ >> / <<操作生成此模数行为,除非您的编译器在其文档中明确保证它。

I think the expression 1 << 32 is the same as 1 << 0 . 我认为表达式1 << 321 << 0相同。 IA-32 Instruction Set Reference says that the count operand of shift instructions is masked to 5 bits. IA-32指令集参考表示移位指令的计数操作数被屏蔽为5位。

The instruction set reference of IA-32 architectures can be found here . 可以在此处找到IA-32架构的指令集参考。

To fix the "extreme" case, I can only come up with the following code (maybe buggy) that may be a little awkward: 为了修复“极端”的情况,我只能提出以下代码(可能是错误的),这可能有点尴尬:

void MaskAddRange(UINT *mask, UINT first, UINT count) {
    int count2 = ((count & 0x20) >> 5);
    int count1 = count - count2;
    *mask |= (((1 << count1) << count2) - 1) << first;
}

The basic idea is to split the shift operation so that each shift count does not exceed 31. Apparently, the above code assumes that the count is in a range of 0..32, so it is not very robust. 基本思路是拆分移位操作,使每个移位计数不超过31.显然,上面的代码假设计数在0..32的范围内,所以它不是很健壮。

If I have understood the requirements, you want an unsigned int, with the top N bits set? 如果我已经理解了这些要求,你需要一个unsigned int,并设置前N位吗?

There are several ways to get the result (I think) you want. 有几种方法可以得到你想要的结果(我想)。 Edit: I am worried that this isnt very robust, and will fail for n>32: 编辑:我担心这不是非常强大,并且对于n> 32将失败:

uint32_t set_top_n(uint32 n)
{
    static uint32_t value[33] = { ~0xFFFFFFFF, ~0x7FFFFFFF, ~0x3FFFFFFF, ~0x1FFFFFFF,
                                  ~0x0FFFFFFF, ~0x07FFFFFF, ~0x03FFFFFF, ~0x01FFFFFF,
                                  ~0x00FFFFFF, ~0x007FFFFF, ~0x003FFFFF, ~0x001FFFFF,
                                  // you get the idea
                                  0xFFFFFFFF
                                  };
    return value[n & 0x3f];
}

This should be quite fast as it is only 132 bytes of data. 这应该非常快,因为它只有132字节的数据。

To make it robust, I'd either extend for all values up to 63, or make it conditional, in which case it can be done with a version of your original bit-masking + the 32 case. 为了使其健壮,我要么将所有值扩展到63,要么使其成为条件,在这种情况下,可以使用原始位掩码的版本+ 32个案例。 Ie

My 32 cents: 我的32美分:

#include <limits.h>

#define INT_BIT     (CHAR_BIT * sizeof(int))

unsigned int set_bit_range(unsigned int n, int frm, int cnt)
{
        return n | ((~0u >> (INT_BIT - cnt)) << frm);
}

List 1. 清单1。

A safe version with bogus / semi-circular result could be: 具有伪造/半圆形结果的安全版本可能是:

unsigned int set_bit_range(unsigned int n, int f, int c)
{
        return n | (~0u >> (c > INT_BIT ? 0 : INT_BIT - c)) << (f % INT_BIT);
}

List 2. 清单2。

Doing this without branching, or local variables, could be something like; 没有分支或局部变量这样做可能是这样的;

return n | (~0u >> ((INT_BIT - c) % INT_BIT)) << (f % INT_BIT);

List 3. 清单3。

List 2 and List 3 This would give "correct" result as long as from is less then INT_BIT and >= 0. Ie: 列表2列表3只要from INT_BIT和> = 0,这将给出“正确”的结果。即:

./bs 1761 26 810
Setting bits from 26 count 810 in 1761 -- of 32 bits
Trying to set bits out of range, set bits from 26 to 836 in 32 sized range
x = ~0u       =  1111 1111 1111 1111 1111 1111 1111 1111

Unsafe version:
x = x >> -778 =  0000 0000 0000 0000 0000 0011 1111 1111
x = x <<  26  =  1111 1100 0000 0000 0000 0000 0000 0000
x v1 Result   =  1111 1100 0000 0000 0000 0110 1110 0001
Original:        0000 0000 0000 0000 0000 0110 1110 0001    

Safe version, branching:
x = x >>   0  =  1111 1111 1111 1111 1111 1111 1111 1111
x = x <<  26  =  1111 1100 0000 0000 0000 0000 0000 0000
x v2 Result   =  1111 1100 0000 0000 0000 0110 1110 0001
Original:        0000 0000 0000 0000 0000 0110 1110 0001    

Safe version, modulo:
x = x >>  22  =  0000 0000 0000 0000 0000 0011 1111 1111
x = x <<  26  =  1111 1100 0000 0000 0000 0000 0000 0000
x v3 Result   =  1111 1100 0000 0000 0000 0110 1110 0001
Original:        0000 0000 0000 0000 0000 0110 1110 0001

You could avoid the undefined behavior by splitting the shift operation in two steps, the first one by (count - 1) bits and the second one by 1 more bit. 您可以通过分两步拆分移位操作来避免未定义的行为,第一步是(count - 1)位,第二位是1位。 Special care is needed in case count is zero, however: 如果计数为零,则需要特别小心,但是:

void MaskAddRange(UINT& mask, UINT first, UINT count)
{
  if (count == 0) return;
  mask |= ((1 << (count - 1) << 1) - 1) << first;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM