简体   繁体   English

C ++中的字符串堆栈溢出?

[英]Stack overflow for string in C++?

I made a small program that looked like this: 我做了一个小程序,看起来像这样:

void foo () {
  char *str = "+++"; // length of str = 3 bytes
  char buffer[1];

  strcpy (buffer, str);

  cout << buffer;
}

int main () {
  foo ();
}

I was expecting that a stack overflow exception would appear because the buffer had smaller size than the str but it printed out +++ successfully... Can someone please explain why would this happened ? 我期望堆栈溢出异常会出现,因为缓冲区的大小比str小,但它成功打印出了+++。有人可以解释为什么会这样吗?
Thank you very much. 非常感谢你。

Undefined Behavior(UB) happened and you were unlucky it did not crash. 发生未定义行为(UB) ,但您不幸的是它没有崩溃。
Writing beyond the bounds of allocated memory is Undefined Behavior and UB does not warrant a crash. 超出分配内存范围的写入操作是“未定义行为”,并且UB不保证发生崩溃。 Anything might happen. 可能发生任何事情。
Undefined behavior means that the behavior cannot be defined. 未定义行为意味着无法定义行为。

You don't get a stack overflow because it's undefined behaviour , which means anything can happen. 您不会得到堆栈溢出,因为它是未定义的行为 ,这意味着任何事情都可能发生。

Many compilers today have special flags that tell them to insert code to check some stack problems, but you often need to explicitly tell the compiler to enable that. 今天,许多编译器都有特殊的标志,告诉他们插入代码以检查某些堆栈问题,但是您通常需要显式地告诉编译器启用该功能。

Undefined behavior... 未定义的行为...

In case you actually care about why there's a good chance of getting a "correct" result in this case: there are a couple of contributing factors. 如果您实际上关心的是在这种情况下为什么很有可能获得“正确的”结果:有两个因素。 Variables with auto storage class (ie, normal, local variables) will typically be allocated on the stack. 具有auto存储类的变量(即普通,局部变量)通常将分配在堆栈上。 In a typical case, all items on the stack will be a multiple of some specific size, most often int -- for example, on a typical 32-bit system, the smallest item you can allocate on the stack will be 32 bits. 在典型情况下,堆栈上的所有项目都是某个特定大小的倍数,通常为int ,例如,在典型的32位系统上,您可以在堆栈上分配的最小项目为32位。 In other words, on your typical 32-bit system, room for four bytes (of four char s, if you prefer that term). 换句话说,在典型的32位系统上,可以容纳四个字节的空间(如果您喜欢该术语,则为四个char )。

Now, as it happens, your source string contained only 3 characters, plus the NUL terminator, for a total of 4 characters. 现在,碰巧的是,您的源字符串仅包含3个字符,再加上NUL终止符,总共包含4个字符。 By pure bad chance, that just happened to be short enough to fit into the space the compiler was (sort of) forced to allocate for buffer , even though you told it to allocate less. 碰巧的机会是,这恰好足够短,无法放入编译器(某种程度上)被迫分配给buffer ,即使您告诉编译器分配的空间要少一些。

If, however, you'd copied a longer string to the target (possibly even just a single byte/char longer) chances of major problems would go up substantially (though in 64-bit software, you'd probably need longer still). 但是,如果您将更长的字符串复制到目标(甚至可能只是一个字节/字符更长的时间),则出现重大问题的机会就会大大增加 (尽管在64位软件中,您可能还需要更长的时间)。

There is one other point to consider as well: depending on the system and the direction the stack grows, you might be able to write well the end of the space you allocated, and still have things appear to work. 还需要考虑另一点:根据系统和堆栈增长的方向,您可能能够很好地写出分配的空间的结尾,并且看起来仍然可以正常工作。 You've allocated buffer in main . 您已经在main分配了buffer The only other thing defined in main is str , but it's just a string literal -- so chances are that no space is actually allocated to store the address of the string literal. main定义的唯一其他内容是str ,但这只是一个字符串文字-因此很可能没有实际分配空间来存储字符串文字的地址。 You end up with the string literal itself allocated statically (not on the stack) and its address substituted where you've used str . 您最终得到的是字符串文字本身本身是静态分配的(不在堆栈上),并且将其地址替换为使用str Therefore, if you write past the end of buffer , you may be just writing into whatever space is left at the top of the stack. 因此,如果您写到buffer的末尾,则可能只是写到堆栈顶部剩余的任何空间。 In a typical case, the stack will be allocated one page at a time. 在典型情况下,将一次为堆栈分配一页。 On most systems, a page is 4K or 8K in size, so for a random amount of space used on the stack, you can expect an average of 2K or 4K free respectively. 在大多数系统上,一个页面的大小为4K或8K,因此对于堆栈上使用的随机空间,您可以期望平均分别有2K或4K的空闲空间。

In reality, since this is in main and nothing else has been called, you can expect the stack to be almost empty, so chances are that there's close to a full page of unused space at the top of the stack, so copying the string into the destination might appear to work until/unless the source string was quite long (eg, several kilobytes). 实际上,由于这是main内容,并且未调用其他任何内容,因此可以预期堆栈几乎为空,因此堆栈顶部几乎有整整一整页的未使用空间,因此可以将字符串复制到在/除非源字符串长(例如,几千字节)之前,目标似乎可以工作。

As to why it will often fail much sooner than that though: in a typical case, the stack grows downward, but the addresses used by buffer[n] will grow upward. 关于为什么它通常会比这种情况更快地失败:在典型情况下,堆栈向下增长,但是buffer[n]使用的地址将向上增长。 In a typical case, the next item on the stack "above" buffer will be the return address from main to the startup code that called main -- therefore, as soon as you write past the amount of space on the stack for buffer (which, as above, is likely to be larger than you specified) you'll end up overwriting the return address from main . 在典型情况下,堆栈“上方” buffer的下一项将是从main到调用main的启动代码的返回地址-因此,一旦您写完堆栈上的buffer空间( ,如上所述,可能比您指定的要大),您最终将覆盖main的返回地址。 In that case, the code inside main will often appear to work fine, but as soon as execution (tries to) return from main, it'll end up using that data you just wrote as the return address, at which point you're a lot more likely to see visible problems. 在这种情况下, main内部的代码通常看起来可以正常工作,但是一旦从main返回执行(尝试),它将最终使用您刚刚编写的数据作为返回地址,此时您将更有可能看到明显的问题。

Outlining what happens: 概述发生了什么:

Either you are lucky and it crashes at once. 要么您很幸运,要么立即崩溃。 Or because it's undefined technically you could end up writing to a memory address used by something else. 或者因为在技术上它是未定义的,所以最终可能写入其他地址使用的内存地址。 say that you had two buffers, one buffer[1] and one longbuffer[100] and assume that the memory address at buffer[2] could be the same as longbuffer[0] which would mean that long buffer now terminates at longbuffer[1] (because the null-termination). 假设您有两个缓冲区,一个buffer[1]和一个longbuffer[100]并假定buffer[2]处的内存地址可以与longbuffer[0]相同,这意味着long buffer现在终止于longbuffer[1] (因为终止符为null)。

char *s = "+++";
char longbuffer[100] = "lorem ipsum dolor sith ameth";
char buffer[1];

strcpy (buffer, str);

/*
buffer[0] = +
buffer[1] = +
buffer[2] = longbuffer[0] = +
buffer[3] = longbuffer[0] = \0 <- since assigning s will null terminate (i.e. add a \0)
*/

std::cout << longbuffer; // will output: +

Hope that helps in clarifying please note it's not very likely that these memory addresses will be the same in the random case, but it could happen, and it doesn't even need to be the same type, anything can be at buffer[2] and buffer[3] addresses before being overwritten by the assignment. 希望有助于澄清, 请注意 ,在随机情况下这些内存地址不太可能是相同的,但是有可能发生,甚至不需要是相同的类型,任何东西都可以放在buffer[2]buffer[3]地址,然后再被分配覆盖。 Then the next time you try to use your (now destroyed) variable it might well crash, and thats when debugging become a bit tedious since the crash doesn't seem to have much to do with the real problem. 然后,下次您尝试使用(现在已销毁的)变量时,它很可能崩溃,并且那也使调试变得有些乏味,因为崩溃似乎与真正的问题没有多大关系。 (ie it crashes when you try to access a variable on your stack while the real problem is that you somewhere else in your code destroyed it). (即,当您尝试访问堆栈中的变量而真正的问题是您在代码中的其他位置破坏了它时,它会崩溃)。

There is no explicit bounds checking, or exception throwing on strcpy - it's a C function. 没有显式的边界检查,也没有在strcpy抛出异常-它是C函数。 If you want to use C functions in C++, you're going to have to take on the responsibility of checking for bounds etc. or switch to using std::string . 如果要在C ++中使用C函数,则必须承担检查边界等的责任,或者切换到使用std::string

In this case it did work, but in a critical system, taking this approach might mean that your unit tests pass but in production, your code barfs - not a situation that you want. 在这种情况下,它确实起作用了,但是在关键系统中,采用这种方法可能意味着您的单元测试通过了,但是在生产中,您的代码被拒绝了-这不是您想要的情况。

Stack corruption is happening, its an undefined behaviour, luckily crash didnt occur. 堆栈损坏正在发生,它的行为不确定,幸运的是没有发生崩溃。 Do the below modifications in your program and run it will crash surely because of stack corruption. 在程序中进行以下修改,然后运行它肯定会由于堆栈损坏而崩溃。

void foo () {
  char *str = "+++"; // length of str = 3 bytes
  int a = 10;
  int *p = NULL;
  char buffer[1];
  int *q = NULL;
  int b = 20;

  p = &a;
  q = &b;

  cout << *p;
  cout << *q;

  //strcpy (buffer, str);

  //Now uncomment the strcpy it will surely crash in any one of the below cout statment.
  cout << *p;
  cout << *q;

  cout << buffer;
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM