简体   繁体   English

(为什么)正在使用未初始化的变量未定义行为?

[英](Why) is using an uninitialized variable undefined behavior?

If I have:如果我有:

unsigned int x;
x -= x;

it's clear that x should be zero after this expression, but everywhere I look, they say the behavior of this code is undefined, not merely the value of x (until before the subtraction).很明显,在这个表达式之后x应该为零,但我看的每个地方,他们都说这段代码的行为是未定义的,而不仅仅是x的值(直到减法之前)。

Two questions:两个问题:

  • Is the behavior of this code indeed undefined?这段代码的行为确实未定义吗?
    (Eg Might the code crash [or worse] on a compliant system?) (例如,代码是否会在兼容系统上崩溃 [或更糟]?)

  • If so, why does C say that the behavior is undefined, when it is perfectly clear that x should be zero here?如果是这样,为什么C 说行为是未定义的,而这里的x应该为零是完全清楚的?

    ie What is the advantage given by not defining the behavior here?即在这里不定义行为有什么好处

Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?显然,编译器可以简单地使用它认为在变量中“方便”的任何垃圾值,并且它会按预期工作……这种方法有什么问题?

Yes this behavior is undefined but for different reasons than most people are aware of.是的,这种行为是未定义的,但原因与大多数人所知的不同。

First, using an unitialized value is by itself not undefined behavior, but the value is simply indeterminate.首先,使用未初始化的值本身并不是未定义的行为,但该值只是不确定的。 Accessing this then is UB if the value happens to be a trap representation for the type.如果该值恰好是该类型的陷阱表示,那么访问它就是 UB。 Unsigned types rarely have trap representations, so you would be relatively safe on that side.无符号类型很少有陷阱表示,所以在这方面你会相对安全。

What makes the behavior undefined is an additional property of your variable, namely that it "could have been declared with register " that is its address is never taken.使行为未定义的是变量的一个附加属性,即它“可以用register声明”,即它的地址永远不会被占用。 Such variables are treated specially because there are architectures that have real CPU registers that have a sort of extra state that is "uninitialized" and that doesn't correspond to a value in the type domain.此类变量被特殊对待,因为有些体系结构具有真实的 CPU 寄存器,这些寄存器具有一种“未初始化”的额外状态,并且与类型域中的值不对应。

Edit: The relevant phrase of the standard is 6.3.2.1p2:编辑:标准的相关短语是 6.3.2.1p2:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.如果左值指定了一个自动存储持续时间的对象,该对象可以使用寄存器存储类声明(从未获取其地址),并且该对象未初始化(未使用初始化程序声明并且在使用之前未对其进行赋值) ),行为未定义。

And to make it clearer, the following code is legal under all circumstances:为了更清楚,以下代码在所有情况下都是合法的:

unsigned char a, b;
memcpy(&a, &b, 1);
a -= a;
  • Here the addresses of a and b are taken, so their value is just indeterminate.这里取了ab的地址,所以它们的值只是不确定的。
  • Since unsigned char never has trap representations that indeterminate value is just unspecified, any value of unsigned char could happen.由于unsigned char从来没有陷阱表示是不确定的值就是不确定的任何值unsigned char可能发生。
  • At the end a must hold the value 0 .最后a必须保持值0

Edit2: a and b have unspecified values: Edit2: ab有未指定的值:

3.19.3 unspecified value 3.19.3未指定值
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance本国际标准对在任何情况下选择哪个值没有强加要求的相关类型的有效值

The C standard gives compilers a lot of latitude to perform optimizations. C 标准为编译器提供了很多执行优化的自由。 The consequences of these optimizations can be surprising if you assume a naive model of programs where uninitialized memory is set to some random bit pattern and all operations are carried out in the order they are written.如果您假设一个简单的程序模型,其中未初始化的内存被设置为某种随机位模式,并且所有操作都按照它们的写入顺序执行,那么这些优化的结果可能会令人惊讶。

Note: the following examples are only valid because x never has its address taken, so it is “register-like”.注意:下面的例子是有效的,因为x从来没有被占用过它的地址,所以它是“类似寄存器的”。 They would also be valid if the type of x had trap representations;如果x的类型具有陷阱表示,它们也将是有效的; this is rarely the case for unsigned types (it requires “wasting” at least one bit of storage, and must be documented), and impossible for unsigned char .对于 unsigned 类型,这很少是这种情况(它需要“浪费”至少一位存储空间,并且必须记录在案),而对于unsigned char不可能。 If x had a signed type, then the implementation could define the bit pattern that is not a number between -(2 n-1 -1) and 2 n-1 -1 as a trap representation.如果x具有有符号类型,则实现可以定义位模式,该位模式不是 -(2 n-1 -1) 和 2 n-1 -1 之间的数字作为陷阱表示。 See Jens Gustedt's answer .请参阅Jens Gustedt 的回答

Compilers try to assign registers to variables, because registers are faster than memory.编译器尝试将寄存器分配给变量,因为寄存器比内存快。 Since the program may use more variables than the processor has registers, compilers perform register allocation, which leads to different variables using the same register at different times.由于程序使用的变量可能比处理器拥有的寄存器多,编译器执行寄存器分配,这导致不同的变量在不同的时间使用相同的寄存器。 Consider the program fragment考虑程序片段

unsigned x, y, z;   /* 0 */
y = 0;              /* 1 */
z = 4;              /* 2 */
x = - x;            /* 3 */
y = y + z;          /* 4 */
x = y + 1;          /* 5 */

When line 3 is evaluated, x is not initialized yet, therefore (reasons the compiler) line 3 must be some kind of fluke that can't happen due to other conditions that the compiler wasn't smart enough to figure out.当第 3 行被求值时, x还没有被初始化,因此(编译器的原因)第 3 行一定是某种侥幸,由于编译器不够聪明而无法弄清楚的其他条件,它不会发生。 Since z is not used after line 4, and x is not used before line 5, the same register can be used for both variables.由于第 4 行之后不使用z ,第 5 行之前也不使用x ,因此两个变量可以使用相同的寄存器。 So this little program is compiled to the following operations on registers:所以这个小程序被编译成以下对寄存器的操作:

r1 = 0;
r0 = 4;
r0 = - r0;
r1 += r0;
r0 = r1;

The final value of x is the final value of r0 , and the final value of y is the final value of r1 . x的最终值是r0的最终值, y的最终值是r1的最终值。 These values are x = -3 and y = -4, and not 5 and 4 as would happen if x had been properly initialized.这些值是 x = -3 和 y = -4,而不是 5 和 4,如果x已正确初始化,则会发生。

For a more elaborate example, consider the following code fragment:有关更详细的示例,请考虑以下代码片段:

unsigned i, x;
for (i = 0; i < 10; i++) {
    x = (condition() ? some_value() : -x);
}

Suppose that the compiler detects that condition has no side effect.假设编译器检测到该condition没有副作用。 Since condition does not modify x , the compiler knows that the first run through the loop cannot possibly be accessing x since it is not initialized yet.由于condition不会修改x ,编译器知道循环的第一次运行不可能访问x因为它尚未初始化。 Therefore the first execution of the loop body is equivalent to x = some_value() , there's no need to test the condition.因此循环体的第一次执行等效于x = some_value() ,无需测试条件。 The compiler may compile this code as if you'd written编译器可能会像您编写的那样编译此代码

unsigned i, x;
i = 0; /* if some_value() uses i */
x = some_value();
for (i = 1; i < 10; i++) {
    x = (condition() ? some_value() : -x);
}

The way this may be modeled inside the compiler is to consider that any value depending on x has whatever value is convenient as long as x is uninitialized.在编译器内部建模的方法是考虑任何依赖于x都有任何方便的值,只要x未初始化。 Because the behavior when an uninitialized variable is undefined, rather than the variable merely having an unspecified value, the compiler does not need to keep track of any special mathematical relationship between whatever-is-convenient values.因为当未初始化的变量未定义时的行为,而不是变量仅具有未指定的值时,编译器不需要跟踪任何方便的值之间的任何特殊数学关系。 Thus the compiler may analyze the code above in this way:因此编译器可以这样分析上面的代码:

  • during the first loop iteration, x is uninitialized by the time -x is evaluated.在第一次循环迭代期间, x在计算-x未初始化。
  • -x has undefined behavior, so its value is whatever-is-convenient. -x具有未定义的行为,因此它的值是任何方便的。
  • The optimization rule condition ? value : value优化规则condition ? value : value condition ? value : value applies, so this code can be simplified to condition ; value condition ? value : value适用,所以这段代码可以简化为condition ; value condition ; value . condition ; value

When confronted with the code in your question, this same compiler analyzes that when x = - x is evaluated, the value of -x is whatever-is-convenient.当遇到您问题中的代码时,同一个编译器会分析,当计算x = - x时, -x的值是任何方便的。 So the assignment can be optimized away.因此可以优化分配。

I haven't looked for an example of a compiler that behaves as described above, but it's the kind of optimizations good compilers try to do.我还没有寻找具有上述行为的编译器示例,但这是优秀编译器尝试进行的优化。 I wouldn't be surprised to encounter one.遇到一个我不会感到惊讶。 Here's a less plausible example of a compiler with which your program crashes.这是一个不太可信的例子,说明你的程序崩溃的编译器。 (It may not be that implausible if you compile your program in some kind of advanced debugging mode.) (如果您在某种高级调试模式下编译程序,这可能不是那么令人难以置信。)

This hypothetical compiler maps every variable in a different memory page and sets up page attributes so that reading from an uninitialized variable causes a processor trap that invokes a debugger.这个假设的编译器映射不同内存页面中的每个变量并设置页面属性,以便从未初始化的变量中读取会导致调用调试器的处理器陷阱。 Any assignment to a variable first makes sure that its memory page is mapped normally.任何对变量的赋值首先要确保它的内存页被正常映射。 This compiler doesn't try to perform any advanced optimization — it's in a debugging mode, intended to easily locate bugs such as uninitialized variables.该编译器不会尝试执行任何高级优化——它处于调试模式,旨在轻松定位诸如未初始化变量之类的错误。 When x = - x is evaluated, the right-hand side causes a trap and the debugger fires up.当计算x = - x ,右侧会导致陷阱并启动调试器。

Yes, the program might crash.是的,程序可能会崩溃。 There might, for example, be trap representations (specific bit patterns which cannot be handled) which might cause a CPU interrupt, which unhandled could crash the program.例如,可能存在可能导致 CPU 中断的陷阱表示(无法处理的特定位模式),未处理可能会导致程序崩溃。

(6.2.6.1 on a late C11 draft says) Certain object representations need not represent a value of the object type. (C11 草案中的 6.2.6.1 说)某些对象表示不需要表示对象类型的值。 If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.如果对象的存储值具有这样的表示形式并且被没有字符类型的左值表达式读取,则行为未定义。 If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.如果这种表示是由没有字符类型的左值表达式修改对象的全部或任何部分的副作用产生的,则行为是未定义的。50) 这种表示称为陷阱表示。

(This explanation only applies on platforms where unsigned int can have trap representations, which is rare on real world systems; see comments for details and referrals to alternate and perhaps more common causes which lead to the standard's current wording.) (此解释仅适用于unsigned int可以具有陷阱表示的平台,这在现实世界系统中很少见;有关详细信息,请参阅注释,并参考导致标准当前措辞的替代和可能更常见的原因。)

(This answer addresses C 1999. For C 2011, see Jens Gustedt's answer.) (此答案针对 C 1999。对于 C 2011,请参阅 Jens Gustedt 的答案。)

The C standard does not say that using the value of an object of automatic storage duration that is not initialized is undefined behavior. C 标准并没有说使用未初始化的自动存储持续时间对象的值是未定义的行为。 The C 1999 standard says, in 6.7.8 10, “If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate.” C 1999 标准在 6.7.8 10 中说,“如果没有明确初始化具有自动存储持续时间的对象,则其值是不确定的。” (This paragraph goes on to define how static objects are initialized, so the only uninitialized objects we are concerned about are automatic objects.) (这一段继续定义静态对象是如何初始化的,所以我们关心的唯一未初始化的对象是自动对象。)

3.17.2 defines “indeterminate value” as “either an unspecified value or a trap representation”. 3.17.2 将“不确定值”定义为“未指定值或陷阱表示”。 3.17.3 defines “unspecified value” as “valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance”. 3.17.3 将“未指定值”定义为“本国际标准对在任何情况下选择哪个值没有要求的相关类型的有效值”。

So, if the uninitialized unsigned int x has an unspecified value, then x -= x must produce zero.因此,如果未初始化的unsigned int x具有未指定的值,则x -= x必须产生零。 That leaves the question of whether it may be a trap representation.这留下了它是否可能是陷阱表示的问题。 Accessing a trap value does cause undefined behavior, per 6.2.6.1 5.根据 6.2.6.1 5,访问陷阱值确实会导致未定义的行为。

Some types of objects may have trap representations, such as the signaling NaNs of floating-point numbers.某些类型的对象可能具有陷阱表示,例如浮点数的信号 NaN。 But unsigned integers are special.但是无符号整数是特殊的。 Per 6.2.6.2, each of the N value bits of an unsigned int represents a power of 2, and each combination of the value bits represents one of the values from 0 to 2 N -1.根据 6.2.6.2,无符号整数的 N 值位中的每一个都表示 2 的幂,并且值位的每个组合表示从 0 到 2 N -1 的值之一。 So unsigned integers can have trap representations only due to some values in their padding bits (such as a parity bit).因此,无符号整数只能由于其填充位(例如奇偶校验位)中的某些值而具有陷阱表示。

If, on your target platform, an unsigned int has no padding bits, then an uninitialized unsigned int cannot have a trap representation, and using its value cannot cause undefined behavior.如果在您的目标平台上,unsigned int 没有填充位,则未初始化的 unsigned int 不能具有陷阱表示,并且使用其值不会导致未定义的行为。

Yes, it's undefined.是的,它是未定义的。 The code can crash.代码可能会崩溃。 C says the behavior is undefined because there's no specific reason to make an exception to the general rule. C 表示该行为是未定义的,因为没有特定理由对一般规则进行例外处理。 The advantage is the same advantage as all other cases of undefined behavior -- the compiler doesn't have to output special code to make this work.优点是与所有其他未定义行为情况相同的优点——编译器不必输出特殊代码来完成这项工作。

Clearly, the compiler could simply use whatever garbage value it deemed "handy" inside the variable, and it would work as intended... what's wrong with that approach?显然,编译器可以简单地使用它认为在变量中“方便”的任何垃圾值,并且它会按预期工作......这种方法有什么问题?

Why do you think that doesn't happen?为什么你认为这不会发生? That's exactly the approach taken.这正是采取的方法。 The compiler isn't required to make it work, but it is not required to make it fail.编译器不需要让它工作,但不需要让它失败。

For any variable of any type, which is not initialized or for other reasons holds an indeterminate value, the following applies for code reading that value:对于任何类型的任何变量,未初始化或由于其他原因持有不确定值,以下适用于读取该值的代码:

  • In case the variable has automatic storage duration and does not have its address taken, the code always invokes undefined behavior [1].如果变量具有自动存储持续时间并且没有获取其地址,则代码始终调用未定义的行为 [1]。
  • Otherwise, in case the system supports trap representations for the given variable type, the code always invokes undefined behavior [2].否则,如果系统支持给定变量类型的陷阱表示,代码总是调用未定义的行为 [2]。
  • Otherwise if there are no trap representations, the variable takes an unspecified value.否则,如果没有陷阱表示,则变量采用未指定的值。 There is no guarantee that this unspecified value is consistent each time the variable is read.无法保证每次读取变量时此未指定的值都是一致的。 However, it is guaranteed not to be a trap representation and it is therefore guaranteed not to invoke undefined behavior [3].但是,它保证不是陷阱表示,因此保证不会调用未定义的行为 [3]。

    The value can then be safely used without causing a program crash, although such code is not portable to systems with trap representations.然后可以安全地使用该值而不会导致程序崩溃,尽管此类代码不可移植到具有陷阱表示的系统。


[1]: C11 6.3.2.1: [1]:C11 6.3.2.1:

If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.如果左值指定了一个自动存储持续时间的对象,该对象可以使用寄存器存储类声明(从未获取其地址),并且该对象未初始化(未使用初始化程序声明并且在使用之前未对其进行赋值) ),行为未定义。

[2]: C11 6.2.6.1: [2]:C11 6.2.6.1:

Certain object representations need not represent a value of the object type.某些对象表示不需要表示对象类型的值。 If the stored value of an object has such a representation and is read by an lvalue expression that does not have character type, the behavior is undefined.如果对象的存储值具有这样的表示形式并且被没有字符类型的左值表达式读取,则行为未定义。 If such a representation is produced by a side effect that modifies all or any part of the object by an lvalue expression that does not have character type, the behavior is undefined.50) Such a representation is called a trap representation.如果这种表示是由没有字符类型的左值表达式修改对象的全部或任何部分的副作用产生的,则行为是未定义的。50) 这种表示称为陷阱表示。

[3] C11: [3] C11:

3.19.2 3.19.2
indeterminate value不确定值
either an unspecified value or a trap representation未指定的值或陷阱表示

3.19.3 3.19.3
unspecified value未指定值
valid value of the relevant type where this International Standard imposes no requirements on which value is chosen in any instance本国际标准对在任何情况下选择哪个值没有强加要求的相关类型的有效值
NOTE An unspecified value cannot be a trap representation.注意未指定的值不能是陷阱表示。

3.19.4 3.19.4
trap representation陷阱表示
an object representation that need not represent a value of the object type不需要表示对象类型值的对象表示

While many answers focus on processors that trap on uninitialized-register access, quirky behaviors can arise even on platforms which have no such traps, using compilers that make no particular effort to exploit UB.虽然许多答案都集中在捕获未初始化寄存器访问的处理器上,但即使在没有此类陷阱的平台上,使用不特别努力利用 UB 的编译器也会出现古怪的行为。 Consider the code:考虑代码:

volatile uint32_t a,b;
uin16_t moo(uint32_t x, uint16_t y, uint32_t z)
{
  uint16_t temp;
  if (a)
    temp = y;
  else if (b)
    temp = z;
  return temp;  
}

a compiler for a platform like the ARM where all instructions other than loads and stores operate on 32-bit registers might reasonably process the code in a fashion equivalent to:像 ARM 这样的平台的编译器,其中除加载和存储之外的所有指令都在 32 位寄存器上运行,可以以等效于的方式合理地处理代码:

volatile uint32_t a,b;
// Note: y is known to be 0..65535
// x, y, and z are received in 32-bit registers r0, r1, r2
uin32_t moo(uint32_t x, uint32_t y, uint32_t z)
{
  // Since x is never used past this point, and since the return value
  // will need to be in r0, a compiler could map temp to r0
  uint32_t temp;
  if (a)
    temp = y;
  else if (b)
    temp = z & 0xFFFF;
  return temp;  
}

If either volatile reads yield a non-zero value, r0 will get loaded with a value in the range 0...65535.如果任一 volatile 读取产生非零值,则 r0 将加载范围为 0...65535 的值。 Otherwise it will yield whatever it held when the function was called (ie the value passed into x), which might not be a value in the range 0..65535.否则,它将产生调用函数时所持有的任何内容(即传递给 x 的值),这可能不是 0..65535 范围内的值。 The Standard lacks any terminology to describe the behavior of value whose type is uint16_t but whose value is outside the range of 0..65535, except to say that any action which could produce such behavior invokes UB.该标准没有任何术语来描述类型为 uint16_t 但其值在 0..65535 范围之外的值的行为,只是说任何可能产生此类行为的操作都会调用 UB。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM