简体   繁体   English

为什么这段代码没有段错误? gcc 是否将其转换为字符串文字?

[英]Why doesn't this code seg fault? Does gcc turn it into a string literal?

#include <stdio.h>

void print(char* c) {
    printf("%s\n", c); //Uses %s to print a string
}

int main() {
    char a = 'd';
    print(&a);
    return 0;
}

How does printf know to stop printing the next character after printing 'd' when there is not a null terminating character at the end?当末尾没有 null 终止字符时, printf 如何知道在打印 'd' 后停止打印下一个字符? When I ran it, it just printed 'd' and ended.当我运行它时,它只是打印了 'd' 并结束了。 Is this normal behaviour?这是正常行为吗?

One letter string requires at least two char array to accommodate the letter and null terminating character.一个字母串至少需要两个char数组来容纳字母和null个终止符。

In your code, you invoke Undefined Behaviour.在您的代码中,您调用未定义的行为。 In your case, you were simply lucky that the next byte in the memory was zero.在您的情况下,您很幸运memory 中的下一个字节为零。

How does printf know to stop printing the next character after printing 'd' when there is not a null terminating character at the end?当末尾没有 null 终止字符时, printf 如何知道在打印 'd' 后停止打印下一个字符?

It does not know.它不知道。 It was your lucky day.这是你的幸运日。 What will happen is undefined.会发生什么是不确定的。

#include <stdio.h>

void printchar(char c) {
    printf("%c\n", c); 
}

void printstring(char *s) {
    printf("%s\n", s); 
}


int main() {
    char a = 'd';
    printchar(a);

    char b[2] = {'b',0};
    printstring(b);
    return 0;
}

Undefined behevior is undefined.未定义的行为是未定义的。 But in practical environments, your particular example is practically bound not to segfault.但在实际环境中,您的特定示例实际上不会出现段错误。

It would only segfault if printf , trying to find a terminating \0 , didn't hit any and ends up in an unmapped or access-protected page.如果printf试图找到一个终止\0 ,但没有命中任何并最终进入未映射或访问保护的页面,它只会出现段错误。

When you give it an on-the stack char, it'll search the currently used portion of the stack (stacks grow downwards on most architectures).当你给它一个堆栈上的字符时,它会搜索堆栈中当前使用的部分(堆栈在大多数体系结构中向下增长)。 Since you're in main, the stack will not be very deep, containing only main's frame and what the OS and your libc put before it, but that's practically enough to provide plenty of zeros.由于您在 main 中,堆栈不会很深,仅包含 main 的框架以及操作系统和您的 libc 放在它之前的内容,但这实际上足以提供大量的零。 For example, Unixes put argv on there (terminated by a NULL pointer => zero bytes), the pointer array from char **environ;例如,Unixes 将 argv 放在那里(由 NULL 指针终止 => 零字节),来自char **environ; (also NULL-terminated) + all the string (each of them '\0'-terminated) and you could also have zeros in the calling code's frame in addition to that. (也以 NULL 结尾)+ 所有字符串(它们中的每一个都以 '\0' 结尾),除此之外,您还可以在调用代码的框架中包含零。

And even if you are not on a Unix (and can't rely on environ and arv nul bytes), any the likelyhood of a nul byte on the used portion of the stack is very high.即使您不在 Unix 上(并且不能依赖 environ 和 arv nul 字节),堆栈已使用部分上的任何 nul 字节的可能性都非常高。

Any code evidently wrong is wrong and invokes an undefined behavior .任何明显错误的代码都是错误的,并且会调用未定义的行为

Said that the specific case you exposed have much chances to correctly printout on many systems for the incidental coincidence that the char type is in most systems the smallest integer available, while the preferred alignment is based on the native processor bitness that typically is greater than a char type.说你暴露的具体情况有很多机会在许多系统上正确打印输出,因为偶然巧合的是, char类型在大多数系统中是最小的 integer 可用,而首选的 alignment 基于通常大于 a 的本机处理器位数char类型。 This implicates the use of padding that normally is the 0 value.这意味着使用通常为0值的填充。

Now put together a char variable followed by 0 padding and you get a null terminated string .现在将一个char变量放在一起,后跟 0 填充,您将得到一个null 终止的字符串

The code is wrong, the probability that it works is high...代码是错误的,它有效的可能性很高......

I cannot speak for your machine, but on my machine the C code you provided results in this machine code fragment (found by compiling the program and invoking objdump -D -j.text on it):我不能代表你的机器,但在我的机器上,你提供的 C 代码导致了这个机器代码片段(通过编译程序并在其上调用objdump -D -j.text ):

    117b:   c6 45 f7 64             movb   $0x64,-0x9(%rbp)
    117f:   48 8d 45 f7             lea    -0x9(%rbp),%rax
    1183:   48 89 c7                mov    %rax,%rdi
    1186:   e8 be ff ff ff          call   1149 <print>

(Keep in mind that passing different options to your compiler or using a different compiler altogether might result in different machine code) (请记住,将不同的选项传递给编译器或完全使用不同的编译器可能会导致不同的机器代码)

The code stores a byte ( movb ) with hex value 0x64 on the stack.该代码在堆栈上存储一个十六进制值为 0x64 的字节 ( movb )。 0x64 is the ascii value for the "d" character. 0x64 是“d”字符的 ascii 值。 Afterwards, it loads the address of that byte on the stack ( lea ) to the rax register and copies it to rdi, which is the register used to pass the first argument to functions on Linux, which i use.之后,它将堆栈 ( lea ) 上该字节的地址加载到 rax 寄存器并将其复制到 rdi,这是用于将第一个参数传递给我使用的 Linux 上的函数的寄存器。 This way, during the print call in the next instruction the first argument is a pointer to your character on the stack.这样,在下一条指令的 print 调用期间,第一个参数是指向堆栈上的字符的指针。

Using GDB one can inspect the contents of the stack memory prior to and after executing the particular movb .使用 GDB 可以在执行特定movb之前和之后检查堆栈 memory 的内容。

Before:前:

00:0000│ rsp 0x7fffffffe0b0 ◂— 0x0
01:0008│     0x7fffffffe0b8 ◂— 0x4ef8437da34e1300
02:0010│ rbp 0x7fffffffe0c0 ◂— 0x1

After:后:

00:0000│ rsp 0x7fffffffe0b0 ◂— 0x6400000000000000
01:0008│     0x7fffffffe0b8 ◂— 0x4ef8437da34e1300
02:0010│ rbp 0x7fffffffe0c0 ◂— 0x1

(i have a GDB extension called pwndbg installed, so your output might look differently) (我安装了一个名为 pwndbg 的 GDB 扩展,所以你的 output 可能看起来不一样)

Essentially, it seems that the stack memory that the "d" is written to is all zero before the write.从本质上讲,似乎“d”写入的堆栈 memory 在写入之前全为零。 Thanks to this, the very next byte after 0x64 is a null character, which creates an "impromptu" printable string.由于这一点,0x64 之后的下一个字节是一个 null 字符,它创建了一个“即兴”可打印字符串。

As others have said this is not a behavior you can rely on, instead it is merely a coincidence.正如其他人所说,这不是您可以依赖的行为,而只是巧合。 You should not write code this way when writing programs that are actually supposed to work:) but this is already emphasized enough in other answers.在编写实际应该工作的程序时,您不应该以这种方式编写代码:)但是在其他答案中已经足够强调了这一点。 Additionally, because of this being undefined behavior it is actually possible that on your machine the program works for entirely different reasons.此外,由于这是未定义的行为,实际上有可能程序在您的机器上运行的原因完全不同。 To be sure of the answer i recommend you debug your code and inspect what the memory contents look like in your case.为了确定答案,我建议您调试代码并检查 memory 内容在您的案例中是什么样子的。 To do that you should:为此,您应该:

  1. Compile your program编译你的程序
  2. Disassemble it, for example using objdump -D -j.text <your program name>反汇编它,例如使用objdump -D -j.text <your program name>
  3. Find the assembly responsible for passing "d" to your print function. Hint: it will be somewhere in the main function找到负责将“d”传递给打印件 function 的程序集。提示:它将位于主要 function 中的某个位置
  4. If needed, inspect what exactly is happening with the memory/registers using GDB如果需要,使用 GDB 检查内存/寄存器到底发生了什么

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM