简体   繁体   English

当左侧操作数为负值时,为什么左移操作会调用未定义行为?

[英]Why does left shift operation invoke Undefined Behaviour when the left side operand has negative value?

In C bitwise left shift operation invokes Undefined Behaviour when the left side operand has negative value.在 C 中,当左侧操作数为负值时,按位左移操作会调用未定义行为。

Relevant quote from ISO C99 (6.5.7/4) ISO C99 (6.5.7/4) 相关引述

The result of E1 << E2 is E1 left-shifted E2 bit positions; E1 << E2 的结果是 E1 左移 E2 位位置; vacated bits are filled with zeros.空出的位用零填充。 If E1 has an unsigned type, the value of the result is E1 × 2 E2 , reduced modulo one more than the maximum value representable in the result type.如果 E1 具有无符号类型,则结果的值为 E1 × 2 E2 ,比结果类型中可表示的最大值减少模 1。 If E1 has a signed type and nonnegative value, and E1 × 2 E2 is representable in the result type, then that is the resulting value;如果 E1 有符号类型和非负值,并且 E1 × 2 E2在结果类型中是可表示的,那么这就是结果值; otherwise, the behavior is undefined .否则,行为未定义

But in C++ the behaviour is well defined.但在 C++ 中,行为是明确定义的。

ISO C++-03 (5.8/2) ISO C++-03 (5.8/2)

The value of E1 << E2 is E1 (interpreted as a bit pattern) left-shifted E2 bit positions; E1 << E2 的值是 E1(解释为位模式)左移的 E2 位位置; vacated bits are zero-filled.空出的位用零填充。 If E1 has an unsigned type, the value of the result is E1 multiplied by the quantity 2 raised to the power E2, reduced modulo ULONG_MAX+1 if E1 has type unsigned long, UINT_MAX+1 otherwise.如果 E1 具有无符号类型,则结果的值是 E1 乘以数量 2 的 E2 次幂,如果 E1 具有 unsigned long 类型,则对模 ULONG_MAX+1 进行减模,否则为 UINT_MAX+1。 [Note: the constants ULONG_MAXand UINT_MAXare defined in the header ). [注意:常量 ULONG_MAX 和 UINT_MAX 定义在标题中)。 ] ]

That means这意味着

int a = -1, b=2, c;
c= a << b ;

invokes Undefined Behaviour in C but the behaviour is well defined in C++.在 C 中调用未定义行为,但该行为在 C++ 中定义良好。

What forced the ISO C++ committee to consider that behaviour well defined as opposed to the behaviour in C?是什么迫使 ISO C++ 委员会认为该行为与 C 中的行为相反?

On the other hand the behaviour is implementation defined for bitwise right shift operation when the left operand is negative, right?另一方面,当左操作数为负时,行为是为按位右移操作implementation definedimplementation defined ,对吗?

My question is why does left shift operation invoke Undefined Behaviour in C and why does right shift operator invoke just Implementation defined behaviour?我的问题是为什么左移操作会调用 C 中的未定义行为,为什么右移操作符只调用实现定义的行为?

PS : Please don't give answers like "It is undefined behaviour because the Standard says so". PS:请不要给出“这是未定义的行为,因为标准是这样说的”之类的答案。 :P :P

The paragraph you copied is talking about unsigned types.您复制的段落正在谈论无符号类型。 The behavior is undefined in C++.该行为在C ++中未定义的。 From the last C++0x draft:来自上一个 C++0x 草案:

The value of E1 << E2 is E1 left-shifted E2 bit positions; E1 << E2 的值是 E1 左移的 E2 位位置; vacated bits are zero-filled.空出的位用零填充。 If E1 has an unsigned type, the value of the result is E1 × 2 E2 , reduced modulo one more than the maximum value representable in the result type.如果 E1 具有无符号类型,则结果的值为 E1 × 2 E2 ,比结果类型中可表示的最大值减少模 1。 Otherwise, if E1 has a signed type and non-negative value, and E1×2 E2 is representable in the result type, then that is the resulting value;否则,如果 E1 具有有符号类型和非负值,并且 E1×2 E2在结果类型中可表示,则为结果值; otherwise, the behavior is undefined .否则,行为是 undefined

EDIT: got a look at C++98 paper.编辑:看看 C++98 论文。 It just doesn't mention signed types at all.它根本没有提到签名类型。 So it's still undefined behavior.所以它仍然是未定义的行为。

Right-shift negative is implementation defined, right.右移否定是实现定义的,对。 Why?为什么? In my opinion: It's easy to implementation-define because there is no truncation from the left issues.在我看来:实现定义很容易,因为左边的问题没有截断。 When you shift left you must say not only what's shifted from the right but also what happens with the rest of the bits eg with two's complement representation, which is another story.当您向左移动时,您不仅要说明从右边移动了什么,还要说明其余位会发生什么,例如用二进制补码表示,这是另一回事。

In C bitwise left shift operation invokes Undefined Behaviour when the left side operand has negative value.在 C 中,当左侧操作数为负值时,按位左移操作会调用未定义行为。 [...] But in C++ the behaviour is well defined. [...] 但在 C++ 中,行为是明确定义的。 [...] why [...] [...] 为什么 [...]

The easy answer is: Becuase the standards say so.简单的答案是:因为标准是这样说的。

A longer answer is: It has probably something to do with the fact that C and C++ both allow other representations for negative numbers besides 2's complement.更长的答案是:这可能与 C 和 C++ 都允许除 2 的补码之外的其他负数表示的事实有关。 Giving fewer guarantees on what's going to happen makes it possible to use the languages on other hardware including obscure and/or old machines.对将要发生的事情提供较少的保证使得可以在其他硬件上使用这些语言,包括晦涩的和/或旧的机器。

For some reason, the C++ standardization committee felt like adding a little guarantee about how the bit representation changes.出于某种原因,C++ 标准化委员会想要添加一些关于位表示如何变化的保证。 But since negative numbers still may be represented via 1's complement or sign+magnitude the resulting value possibilities still vary.但是由于负数仍然可以通过 1 的补码或符号 + 大小来表示,因此结果值的可能性仍然会有所不同。

Assuming 16 bit ints, we'll have假设 16 位整数,我们将有

 -1 = 1111111111111111  // 2's complement
 -1 = 1111111111111110  // 1's complement
 -1 = 1000000000000001  // sign+magnitude

Shifted to the left by 3, we'll get向左移动 3,我们将得到

 -8 = 1111111111111000  // 2's complement
-15 = 1111111111110000  // 1's complement
  8 = 0000000000001000  // sign+magnitude

What forced the ISO C++ committee to consider that behaviour well defined as opposed to the behaviour in C?是什么迫使 ISO C++ 委员会认为该行为与 C 中的行为相反?

I guess they made this guarantee so that you can use << appropriately when you know what you're doing (ie when you're sure your machine uses 2's complement).我猜他们做出了这个保证,以便当您知道自己在做什么时(即,当您确定您的机器使用 2 的补码时),您可以适当地使用 << 。

On the other hand the behaviour is implementation defined for bitwise right shift operation when the left operand is negative, right?另一方面,当左操作数为负时,行为是为按位右移操作定义的实现,对吗?

I'd have to check the standard.我必须检查标准。 But you may be right.但你可能是对的。 A right shift without sign extension on a 2's complement machine isn't particularly useful.在 2 的补码机上没有符号扩展的右移并不是特别有用。 So, the current state is definitely better than requiring vacated bits to be zero-filled because it leaves room for machines that do a sign extensions -- even though it is not guaranteed.因此,当前状态肯定比要求空位填充零要好,因为它为进行符号扩展的机器留下了空间——即使不能保证。

To answer your real question as stated in the title: as for any operation on a signed type, this has undefined behavior if the result of the mathematical operation doesn't fit in the target type (under- or overflow).要回答标题中所述的真正问题:对于有符号类型的任何操作,如果数学运算的结果不适合目标类型(下溢或溢出),这将具有未定义的行为。 Signed integer types are designed like that.有符号整数类型就是这样设计的。

For the left shift operation if the value is positive or 0, the definition of the operator as a multiplication with a power of 2 makes sense, so everything is ok, unless the result overflows, nothing surprising.对于左移操作,如果值是正数或0,将运算符定义为2的幂的乘法是有道理的,所以一切正常,除非结果溢出,没什么奇怪的。

If the value is negative, you could have the same interpretation of multiplication with a power of 2, but if you just think in terms of bit shift, this would be perhaps surprising.如果该值为负数,您可以对 2 的幂的乘法进行相同的解释,但如果您只考虑位移,这可能会令人惊讶。 Obviously the standards committee wanted to avoid such ambiguity.显然,标准委员会希望避免这种歧义。

My conclusion:我的结论:

  • if you want to do real bit pattern operations use unsigned types如果要进行真正的位模式操作,请使用无符号类型
  • if you want to multiply a value (signed or not) by a power of two, do just that, something like如果你想将一个值(有符号或无符号)乘以 2 的幂,就这样做,比如

    i * (1u << k)我 * (1u << k)

your compiler will transform this into decent assembler in any case.在任何情况下,您的编译器都会将其转换为合适的汇编程序。

A lot of these kind of things are a balance between what common CPUs can actually support in a single instruction and what's useful enough to expect compiler-writers to guarantee even if it takes extra instructions.很多这类事情都是在普通 CPU 可以在单个指令中实际支持的内容和足够有用的东西之间取得平衡,即使它需要额外的指令,编译器编写者也能保证。 Generally, a programmer using bit-shifting operators expects them to map to single instructions on CPUs with such instructions, so that's why there's undefined or implementation behaviour where CPUs had various handling of "edge" conditions, rather than mandating a behaviour and having the operation be unexpectedly slow.通常,使用位移运算符的程序员希望它们使用此类指令映射到 CPU 上的单个指令,这就是为什么存在未定义或实现行为,其中 CPU 对“边缘”条件进行各种处理,而不是强制执行行为并进行操作出乎意料地慢。 Keep in mind that the additional pre/post or handling instructions may be made even for the simpler use cases.请记住,即使对于更简单的用例,也可以制定额外的前/后或处理说明。 undefined behaviour may have been necessary where some CPUs generated traps/exceptions/interrupts (as distinct from C++ try/catch type exceptions) or generally useless/inexplicable results, while if the set of CPUs considered by the Standards Committee at the time all provided at least some defined behaviour, then they could make the behaviour implementation defined.在某些 CPU 生成陷阱/异常/中断(与 C++ try/catch 类型异常不同)或通常无用/无法解释的结果时,可能需要未定义的行为,而如果标准委员会当时考虑的 CPU 集全部提供在至少有一些定义的行为,然后他们可以定义行为实现。

My question is why does left shift operation invoke Undefined Behaviour in C and why does right shift operator invoke just Implementation defined behaviour?我的问题是为什么左移操作会调用 C 中的未定义行为,为什么右移操作符只调用实现定义的行为?

The folks at LLVM speculate the shift operator has constraints because of the way the instruction is implemented on various platforms. LLVM 的人推测,由于指令在各种平台上的实现方式,移位运算符具有约束条件。 From What Every C Programmer Should Know About Undefined Behavior #1/3 :每个 C 程序员应该知道的关于未定义行为的内容#1/3

... My guess is that this originated because the underlying shift operations on various CPUs do different things with this: for example, X86 truncates 32-bit shift amount to 5 bits (so a shift by 32-bits is the same as a shift by 0-bits), but PowerPC truncates 32-bit shift amounts to 6 bits (so a shift by 32 produces zero). ...我的猜测是,这是因为各种 CPU 上的底层移位操作对此做了不同的事情:例如,X86 将 32 位移位量截断为 5 位(因此 32 位移位与移位相同按 0 位),但 PowerPC 将 32 位移位量截断为 6 位(因此 32 位移位产生零)。 Because of these hardware differences, the behavior is completely undefined by C...由于这些硬件差异,C 完全未定义行为...

Nate that the discussion was about shifting an amount greater than the register size. Nate 说讨论是关于移位大于寄存器大小的数量。 But its the closest I've found to explaining the shift constraints from an authority.但它是我发现的最接近解释权威的转变限制的方法。

I think a second reason is the potential sign change on a 2's compliment machine.认为第二个原因是 2 的恭维机器上潜在的符号变化。 But I've never read it anywhere (no offense to @sellibitze (and I happen to agree with him)).但我从来没有在任何地方读过它(对@sellibitze 没有冒犯(我碰巧同意他的观点))。

The behavior in C++03 is the same as in C++11 and C99, you just need to look beyond the rule for left-shift. C++03 中的行为与 C++11 和 C99 中的行为相同,您只需要超越左移规则即可。

Section 5p5 of the Standard says that:标准的第 5p5 节说:

If during the evaluation of an expression, the result is not mathematically defined or not in the range of representable values for its type, the behavior is undefined如果在对表达式求值期间,结果未在数学上定义或不在其类型的可表示值范围内,则行为未定义

The left-shift expressions which are specifically called out in C99 and C++11 as being undefined behavior, are the same ones that evaluate to a result outside the range of representable values.在 C99 和 C++11 中作为未定义行为特别调用的左移表达式与计算结果超出可表示值范围的表达式相同。

In fact, the sentence about unsigned types using modular arithmetic is there specifically to avoid generating values outside the representable range, which would automatically be undefined behavior.事实上,关于使用模算术的无符号类型的句子是专门为了避免生成超出可表示范围的值,这将自动成为未定义的行为。

In C89, the behavior of left-shifting negative values was unambiguously defined on two's-complement platforms which did not use padding bits on signed and unsigned integer types.在 C89 中,左移负值的行为在二进制补码平台上明确定义,该平台在有符号和无符号整数类型上不使用填充位。 The value bits that signed and unsigned types had in common to be in the same places, and the only place the sign bit for a signed type could go was in the same place as the upper value bit for unsigned types, which in turn had to be to the left of everything else.有符号和无符号类型的值位共同位于相同的位置,并且有符号类型的符号位唯一可以去的位置与无符号类型的高值位位于同一位置,而后者又必须在其他一切的左边。

The C89 mandated behaviors were useful and sensible for two's-complement platforms without padding, at least in cases where treating them as multiplication would not cause overflow. C89 强制行为对于没有填充的二进制补码平台是有用和明智的,至少在将它们视为乘法不会导致溢出的情况下。 The behavior may not have been optimal on other platforms, or on implementations that seek to reliably trap signed integer overflow.该行为在其他平台上或在寻求可靠地捕获有符号整数溢出的实现上可能不是最佳的。 The authors of C99 probably wanted to allow implementations flexibility in cases where the C89 mandated behavior would have been less than ideal, but nothing in the rationale suggests an intention that quality implementations shouldn't continue to behave in the old fashion in cases where there was no compelling reason to do otherwise. C99 的作者可能希望在 C89 强制行为不太理想的情况下允许实现灵活性,但基本原理中没有任何内容表明质量实现不应继续以旧方式行事的意图没有令人信服的理由不这样做。

Unfortunately, even though there have never been any implementations of C99 that don't use two's-complement math, the authors of C11 declined to define the common-case (non-overflow) behavior;不幸的是,尽管从来没有任何 C99 实现不使用补码数学,但 C11 的作者拒绝定义常见情况(非溢出)行为; IIRC, the claim was that doing so would impede "optimization". IIRC,声称这样做会妨碍“优化”。 Having the left-shift operator invoke Undefined Behavior when the left-hand operand is negative allows compilers to assume that the shift will only be reachable when the left-hand operand is non-negative.当左手操作数为负时,让左移运算符调用未定义行为允许编译器假设只有当左手操作数为非负时才能进行移位。

I'm dubious as to how often such optimizations are genuinely useful, but the rarity of such usefulness actually weighs in favor of leaving the behavior undefined.我怀疑这种优化真正有用的频率有多高,但这种有用性的稀有性实际上有利于让行为未定义。 If the only situations where two's-complement implementations wouldn't behave in commonplace fashion are those where the optimization would actually be useful, and if no such situations actually exist, then implementations would behave in commonplace fashion with or without a mandate, and there's no need to mandate the behavior.如果二进制补码实现不会以普通方式运行的唯一情况是优化实际上有用的那些情况,并且如果实际上不存在这种情况,那么实现将在有或没有授权的情况下以普通方式运行,并且没有需要强制执行该行为。

The result of shifting depends upon the numeric representation.移位的结果取决于数字表示。 Shifting behaves like multiplication only when numbers are represented as two's complement.仅当数字表示为二进制补码时,移位的行为就像乘法。 But the problem is not exclusive to negative numbers.但问题并不只限于负数。 Consider a 4-bit signed number represented in excess-8 (aka offset binary).考虑一个 4 位有符号数,表示为 8 进制(也称为偏移二进制)。 The number 1 is represented as 1+8 or 1001 If we left shift this as bits, we get 0010 which is the representation for -6.数字 1 表示为 1+8 或 1001 如果我们将其左移为位,我们会得到 0010,即 -6 的表示。 Similarly, -1 is represented as -1+8 0111 which becomes 1110 when left-shifted, the representation for +6.类似地,-1 表示为 -1+8 0111,左移时变为 1110,即 +6 的表示。 The bitwise behavior is well-defined, but the numeric behavior is highly dependent on the system of representation.按位行为是明确定义的,但数字行为高度依赖于表示系统。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM