简体   繁体   English

编译器会做什么

[英]What will the compiler do

I've been programming for a few years but embarrassingly, there are one or two things i'm still not fully clear about. 我已经编程了几年了,但令人尴尬的是,我还没有完全了解一两件事。

In the following basic code below just used for an example, when the compiler encounters myFunc(), where will str1 and str2 get stored? 在下面仅用作示例的以下基本代码中,当编译器遇到myFunc()时,str1和str2会存储在哪里?

They are pointers to string literals so I assume the string literal will get stored in read only memory, but what is the difference in this case between one pointer being static local and the other one not? 它们是指向字符串文字的指针,因此我假设字符串文字将被存储在只读存储器中,但是在这种情况下,一个指针是静态局部的,而另一个不是静态局部的则有什么区别? Also, I thought local variables will get stored on the stack and they are not allocated until the function is called? 另外,我认为局部变量将存储在堆栈中,直到调用该函数才分配它们? This is confusing. 这很混乱。

In the case of the integers, var1, it's non-static, but var2 is static. 对于整数var1,它是非静态的,但var2是静态的。 Will the compiler place this var2 in the data segment at compilation time. 编译器在编译时会将var2放置在数据段中吗? I've read on another post When do function-level static variables get allocated/initialized? 我读过另一篇文章什么时候函数级静态变量被分配/初始化? , that local static variables will get created and initialsed the first time they are used and not during compilation. ,则将在首次使用局部静态变量时(而不是在编译过程中)创建和初始化局部静态变量。 So in that case, what if the function is never called? 那么在那种情况下,如果永不调用该函数怎么办?

Thanks in advance for experienced knowledge. 在此先感谢您的丰富经验。

EDITED: To call myFunc() from main(). 编辑:从main()调用myFunc()。 It was a typo as myFunc() was never even called 这是一个错字,因为从未调用过myFunc()

int myFunc()
{
    static char* str1 = "Hello";
    char* str2 = "World";

    int var1 = 1;
    static int var2 = 8;

}

int main()
{

    return myFunc();
}

EDIT: 编辑:

The other answer and comments are correct - as is, your variables will be optimized out because they aren't even used. 其他答案和注释是正确的-因为您的变量甚至没有使用,因此将被优化。 But let's have a little fun and actually use them to see what happens. 但是,让我们有一点乐趣,并实际使用它们来看看会发生什么。

I compiled the op's program as-is with gcc -S trial.c , and although myFunc was never called, nothing else about this answer changes. 我使用gcc -S trial.c编译了op的程序,尽管从未调用过myFunc,但此答案的其他内容都没有改变。

I've slightly modified your program to actually use those variables so we can learn a little more about what the compiler and linker will do. 我已经稍微修改了您的程序以实际使用这些变量,以便我们可以了解有关编译器和链接器将要执行的操作的更多信息。 Here it is: 这里是:

#include <stdio.h>

int myFunc()
{
    static const char* str1 = "Hello";
    const char* str2 = "World";

    int var1 = 1;
    static int var2 = 8;
    printf("%s %s %d %d\n", str1, str2, var1, var2);
    return 0;
}

int main()
{
    return myFunc();
}

I compiled with gcc -S trial.c and got the following assembly file: 我使用gcc -S trial.c进行了编译,并获得了以下汇编文件:

    .file   "trial.c"
    .section .rdata,"dr"
.LC0:
    .ascii "World\0"
.LC1:
    .ascii "%s %s %d %d\12\0"
    .text
    .globl  myFunc
    .def    myFunc; .scl    2;  .type   32; .endef
    .seh_proc   myFunc
myFunc:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $64, %rsp
    .seh_stackalloc 64
    .seh_endprologue
    leaq    .LC0(%rip), %rax
    movq    %rax, -8(%rbp)
    movl    $1, -12(%rbp)
    movl    var2.3086(%rip), %edx
    movq    str1.3083(%rip), %rax
    movl    -12(%rbp), %r8d
    movq    -8(%rbp), %rcx
    movl    %edx, 32(%rsp)
    movl    %r8d, %r9d
    movq    %rcx, %r8
    movq    %rax, %rdx
    leaq    .LC1(%rip), %rcx
    call    printf
    movl    $0, %eax
    addq    $64, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .def    __main; .scl    2;  .type   32; .endef
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    pushq   %rbp
    .seh_pushreg    %rbp
    movq    %rsp, %rbp
    .seh_setframe   %rbp, 0
    subq    $32, %rsp
    .seh_stackalloc 32
    .seh_endprologue
    call    __main
    call    myFunc
    addq    $32, %rsp
    popq    %rbp
    ret
    .seh_endproc
    .data
    .align 4
var2.3086:
    .long   8
    .section .rdata,"dr"
.LC2:
    .ascii "Hello\0"
    .data
    .align 8
str1.3083:
    .quad   .LC2
    .ident  "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
    .def    printf; .scl    2;  .type   32; .endef

var1 isn't even found in the assembly file. 甚至在程序集文件中找不到var1。 It's actually just a constant that gets loaded onto the stack. 实际上,这只是一个常量,已被加载到堆栈中。

At the top of the assembly file, we see "World" (str2) in the .rdata section. 在汇编文件的顶部,我们在.rdata节中看到“世界”(str2)。 Lower down in the assembly file, the string "Hello" is in the .rdata section, but the label for str1 (which contains the label, or address, for "Hello") is in the .data section. 在汇编文件的下方,字符串“ Hello”在.rdata节中,但是str1的标签(其中包含“ Hello”的标签或地址)在.data节中。 var2 is also in the .data section. var2也在.data节中。

Here's a stackoverflow question that delves a little deeper into why this happens. 这是一个stackoverflow问题 ,它会更深入地研究为什么发生这种情况。

Another stackoverflow question points out that the .rdata section is the read-only section of .data and explains the different sections. 另一个stackoverflow问题指出.rdata节是.data的只读节,并解释了不同的节。

Hope this helps. 希望这可以帮助。


EDIT: 编辑:

I decided to try this with the -O3 compiler flag (high optimizations). 我决定尝试使用-O3编译器标志(高度优化)进行尝试。 Here's the assembly file that I got: 这是我得到的汇编文件:

    .file   "trial.c"
    .section .rdata,"dr"
.LC0:
    .ascii "World\0"
.LC1:
    .ascii "Hello\0"
.LC2:
    .ascii "%s %s %d %d\12\0"
    .section    .text.unlikely,"x"
.LCOLDB3:
    .text
.LHOTB3:
    .p2align 4,,15
    .globl  myFunc
    .def    myFunc; .scl    2;  .type   32; .endef
    .seh_proc   myFunc
myFunc:
    subq    $56, %rsp
    .seh_stackalloc 56
    .seh_endprologue
    leaq    .LC0(%rip), %r8
    leaq    .LC1(%rip), %rdx
    leaq    .LC2(%rip), %rcx
    movl    $8, 32(%rsp)
    movl    $1, %r9d
    call    printf
    nop
    addq    $56, %rsp
    ret
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE3:
    .text
.LHOTE3:
    .def    __main; .scl    2;  .type   32; .endef
    .section    .text.unlikely,"x"
.LCOLDB4:
    .section    .text.startup,"x"
.LHOTB4:
    .p2align 4,,15
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    subq    $40, %rsp
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
    xorl    %eax, %eax
    addq    $40, %rsp
    ret
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE4:
    .section    .text.startup,"x"
.LHOTE4:
    .ident  "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
    .def    printf; .scl    2;  .type   32; .endef

var1 is now just a constant 1 that is placed in a register (r9d). var1现在只是放置在寄存器(r9d)中的常数1。 var2 is also just a constant, but it's placed on the stack. var2也是一个常数,但是它被放置在堆栈中。 Also, the strings "Hello" and "World" are accessed in a more direct (efficient) way. 同样,以更直接(有效)的方式访问字符串“ Hello”和“ World”。

So, I decided that I wanted to try something slightly different: 因此,我决定尝试一些稍有不同的方法:

#include <stdio.h>

void myFunc()
{
    static const char* str1 = "Hello";
    const char* str2 = "World";

    int var1 = 1;
    static int var2 = 8;
    printf("%s %s %d %d\n", str1, str2, var1, var2);

    var1++;
    var2++;
    printf("%d %d", var1, var2);
}

int main()
{
    myFunc();
    myFunc();
    return 0;
}

And the associated assembly using gcc -O3 -S trial.c 和相关的程序集使用gcc -O3 -S trial.c

    .file   "trial.c"
    .section .rdata,"dr"
.LC0:
    .ascii "World\0"
.LC1:
    .ascii "Hello\0"
.LC2:
    .ascii "%s %s %d %d\12\0"
.LC3:
    .ascii "%d %d\0"
    .section    .text.unlikely,"x"
.LCOLDB4:
    .text
.LHOTB4:
    .p2align 4,,15
    .globl  myFunc
    .def    myFunc; .scl    2;  .type   32; .endef
    .seh_proc   myFunc
myFunc:
    subq    $56, %rsp
    .seh_stackalloc 56
    .seh_endprologue
    movl    var2.3086(%rip), %eax
    leaq    .LC0(%rip), %r8
    leaq    .LC1(%rip), %rdx
    leaq    .LC2(%rip), %rcx
    movl    $1, %r9d
    movl    %eax, 32(%rsp)
    call    printf
    movl    var2.3086(%rip), %eax
    leaq    .LC3(%rip), %rcx
    movl    $2, %edx
    leal    1(%rax), %r8d
    movl    %r8d, var2.3086(%rip)
    addq    $56, %rsp
    jmp printf
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE4:
    .text
.LHOTE4:
    .def    __main; .scl    2;  .type   32; .endef
    .section    .text.unlikely,"x"
.LCOLDB5:
    .section    .text.startup,"x"
.LHOTB5:
    .p2align 4,,15
    .globl  main
    .def    main;   .scl    2;  .type   32; .endef
    .seh_proc   main
main:
    subq    $40, %rsp
    .seh_stackalloc 40
    .seh_endprologue
    call    __main
    call    myFunc
    call    myFunc
    xorl    %eax, %eax
    addq    $40, %rsp
    ret
    .seh_endproc
    .section    .text.unlikely,"x"
.LCOLDE5:
    .section    .text.startup,"x"
.LHOTE5:
    .data
    .align 4
var2.3086:
    .long   8
    .ident  "GCC: (Rev1, Built by MSYS2 project) 5.4.0"
    .def    printf; .scl    2;  .type   32; .endef

This is looking a little more like the original. 这看起来更像原始的。 var1 is still optimized to just constants, but var2 is now in the .data section again. var1仍然针对常量进行了优化,但是var2现在再次位于.data节中。 "Hello" and "World" are still in the .rdata section because they are constant. “ Hello”和“ World”仍处于.rdata节中,因为它们是常量。

One of the comments points out that this would be different on different platforms with different compilers. 评论之一指出,在具有不同编译器的不同平台上,这将是不同的。 I encourage you to try it out. 我鼓励您尝试一下。

static const char* str1 = "Hello";

str1 is a static local pointer to a string literal which will be stored in read-only memory. str1指向字符串文字的静态本地指针 ,该字符串将存储在只读存储器中。

const char* str2 = "World";

str2 is a local, "stack-allocated" pointer to a string literal which will be stored in read-only memory. str2指向字符串文字的本地“堆栈分配” 指针 ,该字符串将存储在只读存储器中。

The values of str1 and str2 are the respective addresses of the string literals they point to. str1str2是它们指向的字符串文字的相应地址。

int var1 = 1;
static int var2 = 8;

If these lines of code are never reached, var2 will never be initialized. 如果这些代码行从未到达,则var2将永远不会初始化。 I don't know if the compiler sets aside a block of memory for it somewhere else at compiletime or not. 我不知道编译器是否在编译时为其他地方留出一块内存。

What the compiler does must be based (assuming a correctly working compiler) on the semantics of the code, so that's what I'll discuss. 编译器所做的工作必须基于代码的语义(假设编译器工作正常),这就是我要讨论的内容。

First, a fairly minor point. 首先,一个相当小的要点。 By declaring a function with () , you specify that it takes an fixed but unspecified number and type(s) of arguments. 通过使用()声明函数,可以指定它采用固定但未指定数量和参数类型的参数。 That's an obsolescent form of declaration/definition, and there's rarely if ever a good reason to use it. 那是一种过时的声明/定义形式,很少有理由使用它。 (Empty parentheses have a different meaning in C++, but you're asking about C.) To specify that a function has no parameters, use (void) rather than () ( especially for main , since it's not 100% clear that int main() must be accepted by a conforming compiler). (空括号在C ++中有不同的含义,但您要问的是C。)要指定一个函数没有参数,请使用(void)而不是()特别是对于main ,因为并不是100%清楚int main()必须由合格的编译器接受)。

With that change: 有了这个改变:

int myFunc(void)
{
    static char* str1 = "Hello";
    char* str2 = "World";
    int var1 = 1;
    static int var2 = 8;
}

int main(void)
{
    return myFunc();
}

This program does nothing; 该程序不执行任何操作。 it produces no output, and has no side effects. 它不产生任何输出,并且没有副作用。 A compiler is permitted to compile it down to nearly nothing. 允许编译器将其编译为几乎没有内容。 But let's ignore that and assume that nothing is discarded. 但是让我们忽略这一点,并假设没有任何东西被丢弃。

There are two important concepts to consider: scope and lifetime (also known as storage duration ). 有两个重要的概念需要考虑: 范围生存期 (也称为存储期限 )。 The scope of an identifier is the region of program text in which it is visible. 标识符的范围是程序文本在其中可见的区域。 It's purely a compile-time concept. 这纯粹是一个编译时概念。 The lifetime of an object is the duration during execution in which that object exists. 对象的生存期是该对象在执行期间的持续时间。 It's purely a run-time concept. 这纯粹是一个运行时概念。 The two are often confused, particularly when you use the words "local" and "global". 两者经常混淆,尤其是当您使用“本地”和“全局”这两个词时。

An object with automatic storage duration is created on entry to the block in which it's defined, and (logically) destroyed on exit from that block. 具有自动存储持续时间的对象将在其定义所在的块的入口处创建,并在从该块退出时(逻辑上)销毁。 In your program, the relevant block is enclosed by the { and } in the definition of myFunc() . 在您的程序中,相关块被myFunc()定义中的{}括起来。

An object with static storage duration exists during the entire run time of the program. 在程序的整个运行过程中,存在一个具有静态存储持续时间的对象。

static char* str1 = "Hello";

"Hello" is a string literal. "Hello"是字符串文字。 It specifies a static array of type char[6] ; 它指定类型为char[6]静态数组; that array (at least logically) exists during the entire execution of the program. 该数组(至少在逻辑上)在程序的整个执行过程中都存在。 You are not allowed to modify the contents of that array -- but for historical reasons, it's not const , and a compiler isn't required to warn you if you try to modify it. 您不允许修改该数组的内容-但由于历史原因,它不是const ,并且如果尝试修改它,则不需要编译器来警告您。 String literals are commonly stored in read-only memory (probably not physical ROM, but virtual memory that's marked as read-only). 字符串文字通常存储在只读内存中(可能不是物理ROM,而是标记为只读的虚拟内存)。

The pointer object str1 also has static storage duration, though its name is visible only within the enclosing block ("block scope"). 指针对象str1也具有静态存储持续时间,尽管其名称仅在封闭的块(“块作用域”)中可见。 It's initialized to point to the initial character of "Hello" . 初始化为指向"Hello"的初始字符。 This initialization logically occurs before entry to main . 逻辑上,此初始化发生在进入main之前。 Since a string literal is effectively read-only, it would have been better to use const to avoid the risk of accidentally trying to modify it: 由于字符串文字实际上是只读的,因此最好使用const以避免意外尝试修改它的风险:

static const char *str1 = "hello";

Next: 下一个:

char* str2 = "World";

The name of the pointer object str2 has the same kind of block scope as str1 , but the pointer object itself has automatic storage duration. 指针对象的名称str2具有相同种类的块范围作为str1 ,但指针对象本身具有自动存储持续时间。 it is created on entry to the enclosing block and destroyed on exit. 它在进入封闭块时创建,并在退出时销毁。 It's initialized to point to the initial character of "World" ; 初始化为指向"World"的初始字符; that initialization takes place when execution reaches the declaration. 当执行到达声明时进行初始化。 Again, it would be better to add a const to the declaration. 同样,最好在声明中添加一个const

int var1 = 1;
static int var2 = 8;

var has block scope and automatic storage duration. var具有块范围和自动存储期限。 It's initialized to 1 when its declaration is reached at run time. 在运行时达到其声明时,将其初始化为1 var2 has block scope and static storage duration. var2具有块作用域和静态存储持续时间。 The object exists for the entire execution of the program, and it's initialized to 8 before entry to main() . 该对象存在于整个程序执行过程中,并且在输入main()之前已初始化为8

Now we run into a bit of a problem. 现在我们遇到了一个问题。 You've defined myFunc() to return an int result, but you don't actually return anything. 您已经定义了myFunc()返回一个int结果,但实际上没有返回任何结果。 As it happens, this isn't invalid by itself, but if the result is used by a caller (as it is by your main() function), the behavior is undefined. 碰巧,这本身并不是无效的,但是如果结果被调用方使用(因为您的main()函数使用了结果),则行为是不确定的。 The fix is simple: add a return 0; 解决方法很简单:添加return 0; before the closing } . 闭幕前}

Assuming you've added that, main calls myFunc . 假设您已添加, main调用myFunc During execution of myFunc , str2 and var1 are allocated somehow and are initialized as I've described. 在执行myFuncstr2var1是以某种方式分配的,并按照我的描述进行了初始化。 (Nothing happens to str1 or var2 because they're static .) On return from the function, the storage allocated for str2 and var1 is released, effectively destroying the objects. (由于str1var2static因此它们什么也没有发生。)从函数返回后,释放为str2var1分配的存储,有效地销毁了对象。


But the question you asked was: What will the compiler do? 但是您提出的问题是:编译器会做什么? And the answer to that is: It will generate whatever code is necessary to implement the semantics I've just described. 答案是:它将生成实现我刚刚描述的语义所需的任何代码。 That's really all the C standard requires. 这确实是C标准所需要的。

In practice, most compilers generate code that allocates variables with automatic storage duration on the "stack". 实际上,大多数编译器生成的代码会在“堆栈”上分配具有自动存储持续时间的变量。 The "stack" is usually a contiguous region of memory, starting from some fixed base address, that grows in one direction as items are added to it and shrinks in the other direction as items are removed. “堆栈”通常是一个连续的内存区域,从某个固定的基地址开始,随着向其中添加项目,该堆栈在一个方向上增长,而在删除项目时,在另一个方向上收缩。 It's typically managed via a CPU register, the "stack pointer". 通常通过CPU寄存器“堆栈指针”进行管理。 (Some CPUs also have a "frame pointer".) But in fact all that the C standard requires is that such objects are allocated and deallocated in a first-in last-out manner -- and the actual allocation and deallocation needn't take place when you'd expect, as long as the resulting behavior is the same. (某些CPU也具有“帧指针”。)但实际上,C标准所要求的只是以先进先出的方式分配和释放这些对象,而实际的分配和释放不需要只要您期望的结果是相同的,就放置在期望的位置。 For example, if you define a local object inside a loop, it might be allocated and deallocated on each iteration, or its allocation might be folded into the surrounding scope. 例如,如果您在循环内定义一个本地对象,则可能在每次迭代时对其进行分配和释放,或者将其分配折叠到周围的范围内。 The C standard doesn't care (and, in most cases, neither should you). C标准无关紧要(在大多数情况下,您也不应该如此)。 There are even some compilers that don't use a contiguous stack at all; 甚至有一些编译器根本不使用连续堆栈。 rather the storage for each function call is allocated from a heap. 而是从堆分配每个函数调用的存储空间。 A contiguous stack is the best solution 90+% of the time, but it's not required. 连续堆栈是90%以上的时间的最佳解决方案,但这不是必需的。

Objects with static storage duration are typically allocated on program startup, before main is called. 具有静态存储持续时间的对象通常在调用main之前在程序启动时分配。 Most systems store the initial contents of any initialized static objects in the executable file, so it can be loaded into memory. 大多数系统将任何初始化的静态对象的初始内容存储在可执行文件中,因此可以将其加载到内存中。 (That's likely to include string literals.) For static objects whose initial value is zero, the executable might just contain information about how much zeroed memory to allocate. (这可能包括字符串文字。)对于初始值为零的静态对象,可执行文件可能只包含有关要分配多少零内存的信息。

As for the generated instructions that operate on this data, that is entirely dependent on the CPU being targeted, and probably on the system ABI. 至于生成的对这些数据进行操作的指令,这完全取决于目标CPU以及系统ABI。

Your code cannot be compiled without at least a warning as the function never returns anything which contradicts the return type specification. 您的代码至少在没有警告的情况下无法编译,因为该函数绝不会返回与返回类型规范相矛盾的任何内容。

Anyway on my machine it generate code. 无论如何,在我的机器上它都会生成代码。 If you don't use any optimization code is emitted for the function to allocate the local str2 . 如果您不使用任何优化代码,则会为该函数发出分配本地str2优化代码。 str1 and var2 are allocated in the data section of the code to point to the respective values. str1var2在代码的数据部分中分配,以指向相应的值。 If you use optimization obviously a stupid code is emitted and unsued local variable disappeared as unused globals. 如果使用优化,显然会发出愚蠢的代码,未使用的局部变量会作为未使用的全局变量消失。

To observe this you can at least examine the object code with nm : 为了观察这一点,您至少可以使用nm检查目标代码:

$ gcc -o p p.c
$ nm p
0000000100000f90 T _main
0000000100000f70 T _myFunc
0000000100001000 d _myFunc.str1
0000000100001008 d _myFunc.var2
$ gcc -O3 -o p2 p.c
$ nm p2
0000000100000fb0 T _main
0000000100000fa0 T _myFunc

If you want more details, then generate assembler code with -S and observe what happens. 如果需要更多详细信息,请使用-S生成汇编代码并观察会发生什么。

The compiler will produce a program that takes no input, does nothing, then emits no output. 编译器将生成一个不输入,不执行任何操作,然后不输出的程序。

All of those declarations are completely irrelevant as they do not contribute anything to the [non-existent] result of the program. 所有这些声明都是完全无关的,因为它们对程序的[不存在]结果没有任何贡献。 You might say they "get optimised out", though the reality is that they literally have no analogue in your resulting compiled executable. 您可能会说它们“被优化了”,尽管事实是,它们在最终生成的可执行文件中实际上没有类似物。

static variables, even those within a function scope, will get stored at global scope. static变量,即使是函数范围内的变量,也将存储在全局范围内。 The static variables within a function or scope will get initialized only the first time that function or scope is entered. 函数或作用域内的static变量仅在一次输入该函数或作用域时才会初始化。 Non- static variables will get allocated or stored on the stack in most compilers when function scope is entered and initialized when scope is entered. 输入函数作用域时,非static变量将在大多数编译器中分配或存储在堆栈中,输入作用域时将被初始化。 Some compilers store local variables elsewhere. 一些编译器将局部变量存储在其他位置。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM