为什么当我写到数组末尾时我的程序不会崩溃？

Question

Why does the code below work without any crash @ runtime ?为什么下面的代码可以在没有任何崩溃@runtime 的情况下工作？

And also the size is completely dependent on machine/platform/compiler!!.而且大小完全取决于机器/平台/编译器！！。 I can even give upto 200 in a 64-bit machine.我什至可以在 64 位机器中最多放弃 200。 how would a segmentation fault in main function get detected in the OS?如何在操作系统中检测到主函数中的分段错误？

int main(int argc, char* argv[])
{
    int arr[3];
    arr[4] = 99;
}

Where does this buffer space come from?这个缓冲空间从何而来？ Is this the stack allocated to a process ?这是分配给进程的堆栈吗？

Answer 1

Something I wrote sometime ago for education-purposes...我前段时间为了教育目的而写的东西......

Consider the following c-program:考虑以下 c 程序：

int q[200];

main(void) {
    int i;
    for(i=0;i<2000;i++) {
        q[i]=i;
    }
}

after compiling it and executing it, a core dump is produced:编译并执行后，会产生一个核心转储：

$ gcc -ggdb3 segfault.c
$ ulimit -c unlimited
$ ./a.out
Segmentation fault (core dumped)

now using gdb to perform a post mortem analysis:现在使用 gdb 进行事后分析：

$ gdb -q ./a.out core
Program terminated with signal 11, Segmentation fault.
[New process 7221]
#0  0x080483b4 in main () at s.c:8
8       q[i]=i;
(gdb) p i
$1 = 1008
(gdb)

huh, the program didn't segfault when one wrote outside the 200 items allocated, instead it crashed when i=1008, why?嗯，程序在分配的200个项目之外写入时没有段错误，而是在i = 1008时崩溃，为什么？

Enter pages.输入页面。

One can determine the page size in several ways on UNIX/Linux, one way is to use the system function sysconf() like this:在 UNIX/Linux 上可以通过多种方式确定页面大小，一种方式是使用系统函数 sysconf() ，如下所示：

#include <stdio.h>
#include <unistd.h> // sysconf(3)

int main(void) {
    printf("The page size for this system is %ld bytes.\n",
            sysconf(_SC_PAGESIZE));

    return 0;
}

which gives the output:这给出了输出：

The page size for this system is 4096 bytes.此系统的页面大小为 4096 字节。

or one can use the commandline utility getconf like this:或者可以像这样使用命令行实用程序 getconf：

$ getconf PAGESIZE
4096

post mortem验尸

It turns out that the segfault occurs not at i=200 but at i=1008, lets figure out why.事实证明，段错误不是发生在 i=200 而是发生在 i=1008，让我们找出原因。 Start gdb to do some post mortem ananlysis:启动 gdb 做一些事后分析：

$gdb -q ./a.out core

Core was generated by `./a.out'.
Program terminated with signal 11, Segmentation fault.
[New process 4605]
#0  0x080483b4 in main () at seg.c:6
6           q[i]=i;
(gdb) p i
$1 = 1008
(gdb) p &q
$2 = (int (*)[200]) 0x804a040
(gdb) p &q[199]
$3 = (int *) 0x804a35c

q ended at at address 0x804a35c, or rather, the last byte of q[199] was at that location. q 在地址 0x804a35c 处结束，或者更确切地说，q[199] 的最后一个字节在该位置。 The page size is as we saw earlier 4096 bytes and the 32-bit word size of the machine gives that an virtual address breaks down into a 20-bit page number and a 12-bit offset.页大小与我们之前看到的 4096 字节一样，而机器的 32 位字大小使虚拟地址分解为 20 位页号和 12 位偏移量。

q[] ended in virtual page number: q[] 以虚拟页码结尾：

0x804a = 32842 offset: 0x804a = 32842 偏移量：

0x35c = 860 so there were still: 0x35c = 860 所以还有：

4096 - 864 = 3232 bytes left on that page of memory on which q[] was allocated. 4096 - 864 = 3232 字节在分配了 q[] 的内存页上。 That space can hold:该空间可以容纳：

3232 / 4 = 808 integers, and the code treated it as if it contained elements of q at position 200 to 1008. 3232 / 4 = 808 个整数，代码将其视为在位置 200 到 1008 处包含 q 的元素。

We all know that those elements don't exists and the compiler didn't complain, neither did the hw since we have write permissions to that page.我们都知道这些元素不存在并且编译器没有抱怨，硬件也没有抱怨，因为我们对该页面有写权限。 Only when i=1008 did q[] refer to an address on a different page for which we didn't have write permission, the virtual memory hw detected this and triggered a segfault.只有当 i=1008 时 q[] 引用了我们没有写权限的不同页面上的地址，虚拟内存硬件才会检测到这一点并触发段错误。

An integer is stored in 4 bytes, meaning that this page contains 808 (3236/4) additional fake elements meaning that it is still perfectly legal to access these elements from q[200], q[201] all the way up to element 199+808=1007 (q[1007]) without triggering a seg fault.一个整数存储在 4 个字节中，这意味着该页面包含 808 (3236/4) 个额外的假元素，这意味着从 q[200]、q[201] 一直到元素 199 访问这些元素仍然是完全合法的+808=1007 (q[1007]) 不触发段故障。 When accessing q[1008] you enter a new page for which the permission are different.当访问 q[1008] 时，您进入了一个权限不同的新页面。

Answer 2

Since you're writing outside the boundaries of your array, the behaviour of your code in undefined.由于您在数组边界之外编写代码，因此未定义代码的行为。

It is the nature of undefined behaviour that anything can happen , including lack of segfaults (the compiler is under no obligation to perform bounds checking).未定义行为的本质是任何事情都可能发生，包括缺少段错误（编译器没有义务执行边界检查）。

You're writing to memory you haven't allocated but that happens to be there and that -- probably -- is not being used for anything else.您正在写入尚未分配的内存，但恰好在那里，并且 - 可能 - 没有被用于其他任何事情。 Your code might behave differently if you make changes to seemingly unrelated parts of the code, to your OS, compiler, optimization flags etc.如果您对代码中看似不相关的部分、操作系统、编译器、优化标志等进行更改，您的代码可能会有不同的行为。

In other words, once you're in that territory, all bets are off.换句话说，一旦你进入那个领域，所有的赌注都没有了。

Answer 3

Regarding exactly when / where a local variable buffer overflow crashes depends on a few factors:关于局部变量缓冲区溢出崩溃的确切时间/地点取决于几个因素：

The amount of data on the stack already at the time the function is called which contains the overflowing variable access调用函数时堆栈上的数据量，其中包含溢出的变量访问
The amount of data written into the overflowing variable/array in total写入溢出变量/数组的数据总量

Remember that stacks grow downwards .请记住，堆栈向下增长。 Ie process execution starts with a stackpointer close to the end of the memory to-be-used as stack.即进程执行从一个堆栈指针开始，该堆栈指针接近要用作堆栈的内存末尾。 It doesn't start at the last mapped word though, and that's because the system's initialization code may decide to pass some sort of "startup info" to the process at creation time, and often do so on the stack.不过，它不是从最后一个映射字开始，这是因为系统的初始化代码可能决定在创建时将某种“启动信息”传递给进程，并且通常在堆栈上这样做。

That is the usual failure mode - a crash when returning from the function that contained the overflow code.这是通常的故障模式 - 从包含溢出代码的函数返回时崩溃。

If the total amount of data written into a buffer on the stack is larger than the total amount of stackspace used previously (by callers / initialization code / other variables) then you'll get a crash at whatever memory access first runs beyond the top (beginning) of the stack.如果写入堆栈缓冲区的数据总量大于之前使用的堆栈空间总量（由调用方/初始化代码/其他变量），那么无论内存访问首先超出顶部（开始）堆栈。 The crashing address will be just past a page boundary - SIGSEGV due to accessing memory beyond the top of the stack, where nothing is mapped.崩溃地址将刚刚超过页面边界 - SIGSEGV因为访问堆栈顶部之外的内存，没有映射任何内容。

If that total is less than the size of the used part of the stack at this time, then it'll work just ok and crash later - in fact, on platforms that store return addresses on the stack (which is true for x86/x64), when returning from your function.如果这个总数小于此时堆栈的已用部分的大小，那么它会正常工作并稍后崩溃 - 事实上，在将返回地址存储在堆栈上的平台上（这对于 x86/x64 是正确的） )，从您的函数返回时。 That's because the CPU instruction ret actually takes a word from the stack (the return address) and redirects execution there.那是因为 CPU 指令ret实际上从堆栈（返回地址）中取出一个字并将执行重定向到那里。 If instead of the expected code location this address contains whatever garbage, an exception occurs and your program dies.如果该地址包含任何垃圾，而不是预期的代码位置，则会发生异常并且您的程序终止。

To illustrate this: When main() is called, the stack looks like this (on a 32bit x86 UNIX program):为了说明这一点：调用main() ，堆栈如下所示（在 32 位 x86 UNIX 程序上）：

[ esp          ] <return addr to caller> (which exits/terminates process)
[ esp + 4      ] argc
[ esp + 8      ] argv
[ esp + 12     ] envp <third arg to main() on UNIX - environment variables>
[ ...          ]
[ ...          ] <other things - like actual strings in argv[], envp[]
[ END          ] PAGE_SIZE-aligned stack top - unmapped beyond

When main() starts, it will allocate space on the stack for various purposes, amongst others to host your to-be-overflowed array.当main()启动时，它将在堆栈上分配空间用于各种目的，其中包括托管您要溢出的数组。 This will make it look like:这将使它看起来像：

[ esp          ] <current bottom end of stack>
[ ...          ] <possibly local vars of main()>
[ esp + X      ] arr[0]
[ esp + X + 4  ] arr[1]
[ esp + X + 8  ] arr[2]
[ esp + X + 12 ] <possibly other local vars of main()>
[ ...          ] <possibly other things (saved regs)>

[ old esp      ] <return addr to caller> (which exits/terminates process)
[ old esp + 4  ] argc
[ old esp + 8  ] argv
[ old esp + 12 ] envp <third arg to main() on UNIX - environment variables>
[ ...          ]
[ ...          ] <other things - like actual strings in argv[], envp[]
[ END          ] PAGE_SIZE-aligned stack top - unmapped beyond

This means you can happily access way beyond arr[2] .这意味着您可以愉快地访问arr[2]之外的方法。

For a taster of different crashes resulting from buffer overflows, attempt this one:对于缓冲区溢出导致的不同崩溃的体验者，请尝试以下方法：

#include <stdlib.h>
#include <stdio.h>

int main(int argc, char **argv)
{
    int i, arr[3];

    for (i = 0; i < atoi(argv[1]); i++)
        arr[i] = i;

    do {
        printf("argv[%d] = %s\n", argc, argv[argc]);
    } while (--argc);

    return 0;
}

and see how different the crash will be when you overflow the buffer by a little (say, 10) bit, compared to when you overflow it beyond the end of the stack.并查看当您将缓冲区溢出一点（例如 10）位时，与将其溢出到堆栈末尾之外时的崩溃情况有何不同。 Try it with different optimization levels and different compilers.尝试使用不同的优化级别和不同的编译器。 Quite illustrative, as it shows both misbehaviour (won't always print all argv[] correctly) as well as crashes in various places, maybe even endless loops (if, eg, the compiler places i or argc into the stack and the code overwrites it during the loop).非常具有说明性，因为它显示了错误行为（不会总是正确打印所有argv[] ）以及在各个地方崩溃，甚至可能是无限循环（例如，如果编译器将i或argc放入堆栈并且代码覆盖它在循环期间）。

Answer 4

By using an array type, which C++ has inherited from C, you are implicitly asking not to have a range check.通过使用 C++ 从 C 继承的数组类型，您隐含地要求不要进行范围检查。

If you try this instead如果你试试这个

void main(int argc, char* argv[])
{     
    std::vector<int> arr(3);

    arr.at(4) = 99;
}

you will get an exception thrown.你会得到一个异常抛出。

So C++ offers both a checked and an unchecked interface.所以 C++ 提供了一个检查的和一个未检查的接口。 It is up to you to select the one you want to use.您可以选择要使用的那个。

Answer 5

That's undefined behavior - you simply don't observe any problems.这是未定义的行为 - 您根本没有观察到任何问题。 The most likely reason is you overwrite an area of memory the program behavior doesn't depend on earlier - that memory is technically writable (stack size is about 1 megabyte in size in most cases) and you see no error indication.最可能的原因是您覆盖了程序行为之前不依赖的内存区域 - 该内存在技术上是可写的（在大多数情况下，堆栈大小约为 1 兆字节）并且您看不到错误指示。 You shouldn't rely on this.你不应该依赖这个。

Answer 6

To answer your question why it is "undetected": Most C compilers do not analyse at compile time what you are doing with pointers and with memory, and so nobody notices at compile time that you've written something dangerous.回答为什么“未被检测到”的问题：大多数 C 编译器在编译时不会分析您对指针和内存所做的事情，因此在编译时没有人注意到您编写了一些危险的东西。 At runtime, there is also no controlled, managed environment that babysits your memory references, so nobody stops you from reading memory that you aren't entitled to.在运行时，也没有受控的托管环境来照看您的内存引用，因此没有人会阻止您读取您无权使用的内存。 The memory happens to be allocated to you at that point (because its just part of the stack not far from your function), so the OS doesn't have a problem with that either.内存恰好在那时分配给您（因为它只是离您的函数不远的堆栈的一部分），因此操作系统也没有问题。

If you want hand-holding while you access your memory, you need a managed environment like Java or CLI, where your entire program is run by another, managing program that looks out for those transgressions.如果您希望在访问内存时进行手动操作，则需要一个像 Java 或 CLI 这样的托管环境，在该环境中，您的整个程序由另一个负责查找这些违规行为的管理程序运行。

Answer 7

显然，当您要求计算机在内存中分配一定数量的字节时，例如：char array[10] 它为我们提供了一些额外的字节，以免遇到段错误，但是使用这些字节仍然不安全，并试图达到更多的内存最终会导致程序崩溃。

Answer 8

Your code has Undefined Behavior.您的代码具有未定义的行为。 That means it can do anything or nothing.这意味着它可以做任何事情或什么都不做。 Depending on your compiler and OS etc., it could crash.根据您的编译器和操作系统等，它可能会崩溃。

That said, with many if not most compilers your code will not even compile .也就是说，如果不是大多数编译器，您的代码甚至无法编译.

That's because you have void main , while both the C standard and the C++ standard requires int main .那是因为您有void main ，而 C 标准和 C++ 标准都需要int main 。

About the only compiler that's happy with void main is Microsoft's, Visual C++.唯一对void main感到满意的编译器是 Microsoft 的 Visual C++。

That's a compiler defect , but since Microsoft has lots of example documentation and even code generation tools that generate void main , they will likely never fix it.这是一个编译器缺陷，但由于 Microsoft 有很多示例文档，甚至是生成void main代码生成工具，他们可能永远不会修复它。 However, consider that writing Microsoft-specific void main is one character more to type than standard int main .但是，考虑到编写 Microsoft 特定的void main比标准int main多一个字符。 So why not go with the standards?那么为什么不遵循标准呢？

Cheers & hth.,干杯 & hth.,

Answer 9

A segmentation fault occurs when a process tries to overwrite a page in memory which it doesn't own;当进程试图覆盖内存中不属于它的页面时，会发生分段错误； Unless you run a long way over the end of you're buffer you aren't going to trigger a seg fault.除非您在缓冲区结束时跑了很长一段路，否则您不会触发段错误。

The stack is located somewhere in one of the blocks of memory owned by your application.堆栈位于应用程序拥有的内存块之一中的某个位置。 In this instance you have just been lucky if you haven't overwritten something important.在这种情况下，如果你没有覆盖一些重要的东西，你就很幸运了。 You have overwritten perhaps some unused memory.您可能覆盖了一些未使用的内存。 If you were a bit more unlucky you might have overwritten the stack frame of another function on the stack.如果您更不走运，您可能已经覆盖了堆栈中另一个函数的堆栈帧。

为什么当我写到数组末尾时我的程序不会崩溃？

问题描述

9 个解决方案

解决方案1
80 已采纳 2011-06-23 11:04:46

解决方案2
7 2011-06-23 11:01:43

解决方案3
4 2011-06-23 11:40:28

解决方案4
3 2011-06-23 11:06:30

解决方案5
2 2011-06-23 11:01:49

解决方案6
1 2011-06-23 11:08:02

解决方案7
0 2019-01-24 13:46:45

解决方案8
0 2011-06-23 11:03:01

解决方案9
0 2011-06-23 11:06:16

为什么当我写到数组末尾时我的程序不会崩溃？

问题描述

9 个解决方案

解决方案1 80 已采纳 2011-06-23 11:04:46

解决方案2 7 2011-06-23 11:01:43

解决方案3 4 2011-06-23 11:40:28

解决方案4 3 2011-06-23 11:06:30

解决方案5 2 2011-06-23 11:01:49

解决方案6 1 2011-06-23 11:08:02

解决方案7 0 2019-01-24 13:46:45

解决方案8 0 2011-06-23 11:03:01

解决方案9 0 2011-06-23 11:06:16

解决方案1
80 已采纳 2011-06-23 11:04:46

解决方案2
7 2011-06-23 11:01:43

解决方案3
4 2011-06-23 11:40:28

解决方案4
3 2011-06-23 11:06:30

解决方案5
2 2011-06-23 11:01:49

解决方案6
1 2011-06-23 11:08:02

解决方案7
0 2019-01-24 13:46:45

解决方案8
0 2011-06-23 11:03:01

解决方案9
0 2011-06-23 11:06:16