简体   繁体   English

修改指针指向的字符串是否有效?

[英]Is modifying a string pointed to by a pointer valid?

Here's a simple example of a program that concatenates two strings. 这是一个连接两个字符串的程序的简单示例。

#include <stdio.h>

void strcat(char *s, char *t);

void strcat(char *s, char *t) {
    while (*s++ != '\0');
    s--;
    while ((*s++ = *t++) != '\0');
}

int main() {
    char *s = "hello";
    strcat(s, " world");
    while (*s != '\0') {
        putchar(*s++);
    }
    return 0;
}

I'm wondering why it works. 我想知道为什么它起作用。 In main(), I have a pointer to the string "hello". 在main()中,我有一个指向字符串“ hello”的指针。 According to the K&R book, modifying a string like that is undefined behavior. 根据K&R的书,修改这样的字符串是未定义的行为。 So why is the program able to modify it by appending " world"? 那么,为什么程序可以通过添加“世界”来修改它呢? Or is appending not considered as modifying? 还是附加不被视为修改?

Undefined behavior means a compiler can emit code that does anything. 未定义的行为意味着编译器可以发出执行任何操作的代码。 Working is a subset of undefined. 工作是未定义的子集。

I +1'd MSN, but as for why it works, it's because nothing has come along to fill the space behind your string yet. 我+1了MSN,但是关于它为什么起作用的原因是,至今还没有任何东西可以填满字符串的后面的空间。 Declare a few more variables, add some complexity, and you'll start to see some wackiness. 声明更多的变量,增加一些复杂性,您将开始看到一些古怪的地方。

Perhaps surprisingly, your compiler has allocated the literal "hello" into read/write initialized data instead of read-only initialized data. 也许令人惊讶的是,您的编译器已将字面量"hello"分配给读/写初始化数据而不是只读初始化数据。 Your assignment clobbers whatever is adjacent to that spot, but your program is small and simple enough that you don't see the effects. 您的作业会使该地点附近的任何东西变得混乱,但您的程序又小又简单,以至于看不到效果。 (Put it in a for loop and see if you are clobbering the " world" literal.) (将其放在for循环中,看看您是否正在破坏" world"文字。)

It fails on Ubuntu x64 because gcc puts string literals in read-only data, and when you try to write, the hardware MMU objects. 它在Ubuntu x64上失败,因为gcc将字符串文字放入只读数据中,并且当您尝试写入时,将硬件MMU对象写入。

You were lucky this time. 这次你很幸运。
Especially in debug mode some compilers will put spare memory (often filled with some obvious value) around declarations so you can find code like this. 特别是在调试模式下,某些编译器会在声明周围放置备用内存(通常填充一些明显的值),以便您可以找到这样的代码。

It also depends on the how the pointer is declared. 它还取决于如何声明指针。 For example, can change ptr, and what ptr points to: 例如,可以更改ptr以及ptr指向的内容:

char * ptr;

Can change what ptr points to, but not ptr: 可以更改ptr指向的内容,但不能更改ptr:

char const * ptr;

Can change ptr, but not what ptr points to: 可以更改ptr,但不能更改ptr指向的内容:

const char * ptr;

Can't change anything: 无法更改任何内容:

const char const * ptr;

I'm wondering why it works 我想知道为什么行得通

It doesn't. 没有。 It causes a Segmentation Fault on Ubuntu x64; 它会在Ubuntu x64上导致分段错误; for code to work it shouldn't just work on your machine . 为了代码正常工作,它不仅应该在您的机器上运行

Moving the modified data to the stack gets around the data area protection in linux: 将修改后的数据移动到堆栈可以绕过linux中的数据区域保护:

int main() {
    char b[] = "hello";
    char c[] = " ";
    char *s = b;

    strcat(s, " world");

    puts(b);
    puts(c);

    return 0;
}

Though you then are only safe as 'world' fits in the unused spaces between stack data - change b to "hello to" and linux detects the stack corruption: 尽管您然后才安全,因为“世界”适合堆栈数据之间未使用的空间-将b更改为“ hello to”,Linux将检测到堆栈损坏:

*** stack smashing detected ***: bin/clobber terminated

According to the C99 specifification (C99: TC3, 6.4.5, §5), string literals are 根据C99规范(C99:TC3,6.4.5,§5),字符串文字是

[...] used to initialize an array of static storage duration and length just sufficient to contain the sequence. 用于初始化静态存储持续时间和长度的数组,足以包含该序列。 [...] [...]

which means they have the type char [] , ie modification is possible in principle. 这意味着它们具有char []类型,即原则上可以进行修改。 Why you shouldn't do it is explained in §6: §6中说明了为什么不应该这样做:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. 如果它们的元素具有适当的值,则不确定这些数组是否不同。 If the program attempts to modify such an array, the behavior is undefined. 如果程序尝试修改这样的数组,则行为是不确定的。

Different string literals with the same contents may - but don't have to - be mapped to the same memory location. 具有相同内容的不同字符串文字可以(但不必)映射到相同的内存位置。 As the behaviour is undefined, compilers are free to put them in read-only sections in order to cleanly fail instead of introducing possibly hard to detect error sources. 由于行为是不确定的,因此编译器可以自由地将它们放在只读段中,以彻底清除错误,而不是引入可能很难检测到的错误源。

The compiler is allowing you to modify s because you have improperly marked it as non-const -- a pointer to a static string like that should be 编译器允许您修改s,因为您未正确地将其标记为非常量-指向静态字符串的指针应为

const char *s = "hello";

With the const modifier missing, you've basically disabled the safety that prevents you from writing into memory that you shouldn't write into. 在缺少const修饰符的情况下,您基本上已禁用了阻止您写入不应写入的内存的安全性。 C does very little to keep you from shooting yourself in the foot. C几乎不会阻止您脚踩脚。 In this case you got lucky and only grazed your pinky toe. 在这种情况下,您很幸运,只吃了小指脚趾。

s points to a bit of memory that holds "hello", but was not intended to contain more than that. s指向保留“ hello”的一点内存,但并不打算包含更多内容。 This means that it is very likely that you will be overwriting something else. 这意味着您很可能会覆盖其他内容。 That is very dangerous, even though it may seem to work. 即使看起来似乎可行,这也是非常危险的。

Two observations: 两个观察:

  1. The * in *s-- is not necessary. * s--中的*不是必需的。 s-- would suffice, because you only want to decrement the value. s--就足够了,因为您只想减少该值。
  2. You don't need to write strcat yourself. 您无需自己编写strcat。 It already exists (you probably knew that, but I'm telling you anyway:-)). 它已经存在(您可能知道,但无论如何我要告诉您:-)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM