简体   繁体   English

function 返回了局部变量的地址,但是在 c 中仍然可以编译,为什么?

[英]function returns address of local variable, but it still compile in c, why?

Even I get an warning a function returns an address from local variable, it compiles.即使我收到警告function 从局部变量返回地址,它也会编译。 Isn't it then UB of compiler?那不是编译器的UB吗? The generated assembly:生成的程序集:

    .text
.LC0:
    .asciz "%i\n"
    .globl  foo
    .type   foo, @function
foo:
    pushq   %rbp    #
    movq    %rsp, %rbp  #,
    sub     $16, %rsp   #,
    mov     %rdi, -8(%rbp)   #,
    leaq    -8(%rbp), %rax   #,
# a.c:5: }
    leave 
    ret
    .size   foo, .-foo
    .globl  main
    .type   main, @function
main:
    pushq   %rbp    #
    movq    %rsp, %rbp  #,
# a.c:8:    foo();
    movl    $123, %edi  #,
    call    foo #
    movq    (%rax), %rsi   #,
    leaq    .LC0(%rip), %rdi   #,
    movl    $0, %eax   #,
    call    printf #,
    movl $0, %eax
# a.c:9: }
    popq    %rbp    #
    ret 
    .size   main, .-main
    .ident  "GCC: (Debian 8.3.0-6) 8.3.0"
    .section    .note.GNU-stack,"",@progbits

Here the assmebly is returning an address of local variable leaq -8(%rbp), %rax , but then it calls instrution leave , which should "invalidate" the address -8(%rbp) (the stack pointer is added, so the I should be no longer be able to dereference that address, since the program moved on).在这里,程序集返回局部变量leaq -8(%rbp), %rax的地址,但随后它调用指令leave ,这应该“无效”地址-8(%rbp) (添加了堆栈指针,所以我应该不再能够取消引用该地址,因为程序继续运行)。 So why it compile, and happily dereference the mov (%rax), %rdi , when the address retunred to %rax is no longer valid?那么,当返回到%rax的地址不再有效时,为什么它会编译并愉快地取消引用mov (%rax), %rdi呢? Should not it segfault or terminate?它不应该出现段错误或终止吗?

Even I get an warning a function returns an address from local variable, it compiles.即使我收到警告 function 从局部变量返回地址,它也会编译。 Isn't it then UB of compiler?那不是编译器的UB吗?

No, but if it were, how could you tell?不,但如果是,你怎么知道? You seem to have a misunderstanding of undefined behavior.您似乎对未定义的行为有误解。 It does not mean "the compiler must reject it", "the compiler must warn about it", "the program must terminate", or any such thing.这并不意味着“编译器必须拒绝它”、“编译器必须警告它”、“程序必须终止”或任何类似的东西。 Those indeed may be a manifestations of UB, but if the language specification required such behavior then it wouldn't be undefined .这些确实可能是 UB 的一种表现形式,但如果语言规范需要这种行为,那么它就不会是undefined Ensuring that a C program does not exercise undefined behavior is the responsibility of the programmer, not the C implementation.确保 C 程序不执行未定义行为是程序员的责任,而不是 C 实现的责任。 Where a programmer does not fulfill that responsibility, the C implementation explicitly has no reciprocal responsibility -- it can do anything within its capabilities.在程序员不履行该职责的情况下,C 实现明确没有相互责任——它可以在其能力范围内做任何事情。

Moreover, there is no single "the" C compiler.此外,没有单一的“C”编译器。 Different compilers may do things differently and still conform to the C language specifications.不同的编译器可能会做不同的事情,但仍然符合 C 语言规范。 This is where implementation-defined, unspecified, and undefined behaviors come in. Allowing such variance is intentional on the part of the C language designers.这就是实现定义的、未指定的和未定义的行为出现的地方。允许这种差异是 C 语言设计者有意的。 Among other things, it allows implementations to operate in ways that are natural for their particular target hardware and execution environments.除其他外,它允许实现以对其特定目标硬件和执行环境而言自然的方式运行。

Now let's go back to "no".现在让我们将 go 恢复为“否”。 Here is a prototypical example of a function returning the address of an automatic variable:这是一个返回自动变量地址的 function 的原型示例:

int *foo() {
    int bar = 0;
    return &bar;
}

What about that is supposed to have undefined behavior?那应该有未定义的行为呢? It is well defined for the function to compute the address of bar , and the resulting pointer value has the correct type to be returned by the function.为 function 计算bar的地址是明确定义的,结果指针值具有正确的类型,由 function 返回。 After bar 's lifetime ends when the function returns, the return value becomes indeterminate (paragraph 6.2.4/2 of the standard), but that does not in itself give rise to any undefined behavior.当 function 返回时bar的生命周期结束后,返回值变得不确定(标准的第 6.2.4/2 段),但这本身不会引起任何未定义的行为。

Or consider a caller:或者考虑一个来电者:

void test1() {
    int *bar_ptr = foo();  // OK under all circumstances
}

As already discussed, our particular foo() 's return value will always be indeterminate, so in particular, it might be a trap representation.正如已经讨论过的,我们特定的foo()的返回值总是不确定的,所以特别是,它可能是一个陷阱表示。 But that's a runtime consideration, not a compile-time one.但这是运行时考虑,而不是编译时考虑。 And even if the value were a trap representation, C does not require that the implementation refuse or fail to store it.即使该值是一个陷阱表示,C 也不要求实现拒绝或无法存储它。 In particular, footnote 50 to C11 is explicit on this point:特别是,C11 的脚注 50 明确说明了这一点:

Thus, an automatic variable can be initialized to a trap representation without causing undefined behavior, but the value of the variable cannot be used until a proper value is stored in it.因此,可以将自动变量初始化为陷阱表示,而不会导致未定义的行为,但是只有在其中存储了适当的值之后才能使用该变量的值。

Note also that foo() and test1() can be compiled by different runs of the compiler, such that when compiling test1() , the compiler knows nothing about the behavior of foo() beyond what is indicated by its prototype.另请注意, foo()test1()可以通过编译器的不同运行进行编译,这样在编译test1()时,编译器对foo()的行为一无所知,超出其原型所指示的行为。 C does not place translation-time requirements on implementations that depend on the runtime behavior of programs. C 不对依赖于程序运行时行为的实现提出翻译时间要求。

On the other hand, the requirements around trap representations would apply differently if the function were modified slightly:另一方面,如果对 function 稍作修改,则有关陷阱表示的要求将有所不同:

void test2() {
    int *bar_ptr = NULL;
    bar_ptr = foo();      // UB (only) if foo() returns a trap representation
}

If the return value of foo() turns out to be a trap representation, then storing it in bar_ptr (as opposed to initializing bar_ptr with it) produces undefined behavior at runtime.如果foo()的返回值被证明是一个陷阱表示,那么它存储在bar_ptr中(而不是用它初始化bar_ptr )会在运行时产生未定义的行为。 Again, however, "undefined" means just what it says on the tin.然而,再一次,“未定义”意味着它在锡上所说的。 C does not define any particular behavior for implementations to exhibit under the circumstances, and in particular, it does not require that programs terminate or manifest any externally-visible behavior at all. C 没有定义实现在这种情况下表现出的任何特定行为,特别是,它根本不需要程序终止或表现出任何外部可见的行为。 And again, that's a runtime consideration, not a compile-time one.同样,这是运行时考虑,而不是编译时考虑。

Furthermore, if foo() 's return value turns out not to be a trap representation (being instead a pointer value that is not the address of any live object), then there's nothing wrong with reading that value itself, either:此外,如果foo()的返回值不是陷阱表示(而是不是任何活动对象的地址的指针值),那么读取该值本身也没有问题,要么:

void test3() {
    int *bar_ptr = foo();
    // UB (only) if foo() returned a trap representation:
    printf("foo() returned %p\n", (void *) bar_ptr);
}

The biggest and most commonly-exercised undefined behavior in this area would be that of trying to dereference the return value of foo() , which, trap representation or not, almost surely does not point to a live int object:在这个领域中最大和最常见的未定义行为是尝试取消引用foo()的返回值,无论是否陷阱表示,它几乎肯定不会指向实时int object:

void test4() {
    int *bar_ptr = foo();
    // UB under all circumstances for the given foo():
    printf("foo() returned a pointer to an int with value %d\n", *bar_ptr);
}

But again, that's a runtime consideration, not a compile-time one.但同样,这是运行时考虑,而不是编译时考虑。 And again, undefined means undefined.同样,未定义意味着未定义。 The C implementation should be expected to translate that successfully as long as there are in-scope declarations for the functions involved, and although some compilers might warn, they have no obligation to do so.只要所涉及的函数有范围内的声明,就应该期望 C 实现成功地转换它,尽管一些编译器可能会发出警告,但他们没有义务这样做。 The runtime behavior of function test4 is undefined, but that does not mean the program necessarily will segfault or terminate in some other manner. function test4的运行时行为未定义,但这并不意味着程序必然会出现段错误或以其他方式终止。 It might, but I expect that in practice, the undefined behavior manifested by a great many implementations would be to print "foo() returned a pointer to an int with value 0".可能,但我希望在实践中,许多实现所表现出的未定义行为将是打印“foo() 返回指向值为 0 的 int 的指针”。 Doing so is in no way inconsistent with C's requirements.这样做绝不违反 C 的要求。

As you stated, if you return the address of a local variable from a function and attempt to dereference (or even read) that address, you invoke undefined behavior .如您所述,如果您从 function 返回局部变量的地址并尝试取消引用(甚至读取)该地址,则会调用undefined behavior

The formal definition of undefined behavior is stated in section 3.4.3 of the C standard :未定义行为的正式定义在C 标准的第 3.4.3 节中说明:

behavior, upon use of a nonportable or erroneous program construct or of erroneous data,for which this International Standard imposes no requirements行为,在使用不可移植或错误程序结构或错误数据时,本国际标准对此没有要求

When undefined behavior occurs, the compiler makes no guarantees about what will happen.当发生未定义的行为时,编译器不保证会发生什么。 The program may crash, it may output strange results, or it may appear to work properly.程序可能会崩溃,也可能是 output 出现奇怪的结果,也可能看起来工作正常。

Generally speaking, compilers will assume code does not contain undefined behavior and work under that assumption.一般来说,编译器会假设代码不包含未定义的行为并在该假设下工作。 So when it does, all bets are off.所以当它发生时,所有的赌注都没有了。

Just because the program could crash doesn't mean it will .程序可能会崩溃并不意味着它崩溃。

It will compile of course and some compilers will emit the diagnostic message informing you about the problem.它当然会编译,并且一些编译器会发出诊断消息,通知您该问题。 Many compilers allows to treat such a messages (typically called warning) as errors by passing command line options.许多编译器允许通过传递命令行选项将此类消息(通常称为警告)视为错误。

UB means that behaviour of your program when you run it is undefined UB 意味着您的程序在运行时的行为是未定义

The difficulty is that the Standard strongly implies(*) that the presence of code which would invoke Undefined Behavior if executed should not interfere with the execution of the program in cases where that code would not be executed.困难在于标准强烈暗示(*)如果执行将调用未定义行为的代码的存在不应该在代码不会被执行的情况下干扰程序的执行。 When the compiler generates code for the function, it has no idea if code that calls the function might attempt to treat the return value as an address in some fashion that would not be defined either by the Standard or by any extended semantics the implementation might offer.当编译器为 function 生成代码时,它不知道调用 function 的代码是否会尝试以某种方式将返回值视为地址,该地址不会由标准或实现可能提供的任何扩展语义定义. For example, many implementations guarantee that if conversion from a pointer to a uintptr_t within the lifetime of its target yields a certain value, conversion of that pointer to uintptr_t will always yield that value, without regard for whether its target still exists.例如,许多实现保证如果在其目标的生命周期内从指针转换为uintptr_t会产生某个值,那么将该指针转换为uintptr_t将始终产生该值,而不管其目标是否仍然存在。 Commercial compilers often abide by the philosophy that if it's remotely conceivable that a programmer might want to do something (such as converting the address of a pointer to uintptr_t and logging it, to allow comparison with other pointer values that were logged earlier in program execution), and there's nothing to be gained by not allowing it, the compiler may as well allow it.商业编译器通常遵循这样的理念,即如果可以远程想象程序员可能想要做某事(例如将指针的地址转换为uintptr_t并记录它,以允许与在程序执行之前记录的其他指针值进行比较) ,不让也无济于事,编译器也可以允许。

(*) Under the One Program Rule, a compiler that can properly process at least one program that exercises the translation limits given in the Standard may do anything it likes when fed any other source text. (*) 在单一程序规则下,能够正确处理至少一个执行标准中给出的翻译限制的程序的编译器可以在输入任何其他源文本时做任何它喜欢的事情。 Thus, if a compiler writer thought it more useful to reject all programs meeting some criteria, despite some such programs being Strictly Conforming, than to process such programs, such behavior would not make a compiler non-conforming.因此,如果编译器编写者认为拒绝所有符合某些标准的程序(尽管某些此类程序严格符合)比处理此类程序更有用,则此类行为不会使编译器不符合要求。 Nonetheless, the Standard elsewhere says that a program would invoke UB when given some inputs could be a correct program with fully defined behavior when given other inputs.尽管如此,其他地方的标准表示,当给定某些输入时,程序将调用 UB,而在给定其他输入时,程序可能是具有完全定义行为的正确程序。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM