简体   繁体   English

可以修改 C 中的字符串文字吗?

[英]Can a string literal in C be modified?

I recently had a question, I know that a pointer to a constant array initialized as it is in the code below, is in the .rodata region and that this region is only readable.我最近有一个问题,我知道在下面的代码中初始化的常量数组的指针位于.rodata区域中,并且该区域仅可读。 However, I saw in pattern C11, that writing in this memory address behavior will be undefined.但是,我在模式 C11 中看到,写入此内存地址行为将是未定义的。 I was aware that the Borland's Turbo-C compiler can write where the pointer points, this would be because the processor operated in real mode on some systems of the time, such as MS-DOS?我知道Borland的Turbo-C编译器可以写指针指向的地方,这是因为处理器在当时的某些系统上以实模式运行,例如MS-DOS? Or is it independent of the operating mode of the processor?还是独立于处理器的工作模式? Is there any other compiler that writes to the pointer and does not take any memory breach failure using the processor in protected mode?是否有任何其他编译器写入指针并且在保护模式下使用处理器不会发生任何内存破坏故障?

#include <stdio.h>

int main(void) {
    char *st = "aaa";
    *st = 'b'; 
    return 0;
}

In this code compiling with Turbo-C in MS-DOS, you will be able to write to memory在 MS-DOS 中使用 Turbo-C 编译的这段代码中,您将能够写入内存

As has been pointed out, trying to modify a constant string in C results in undefined behavior.正如已经指出的那样,尝试修改 C 中的常量字符串会导致未定义的行为。 There are several reasons for this.有几个原因。

One reason is that the string may be placed in read-only memory.原因之一是该字符串可能被放置在只读存储器中。 This allows it to be shared across multiple instances of the same program, and doesn't require the memory to be saved to disk if the page it's on is paged out (since the page is read-only and thus can be reloaded later from the executable).这允许它在同一程序的多个实例之间共享,并且如果它所在的页面被调出,则不需要将内存保存到磁盘(因为页面是只读的,因此可以稍后从可执行文件)。 It also helps detect run-time errors by giving an error (eg a segmentation fault) if an attempt is made to modify it.如果尝试修改它,它还可以通过给出错误(例如分段错误)来帮助检测运行时错误。

Another reason is that the string may be shared.另一个原因是字符串可能是共享的。 Many compilers (eg, gcc ) will notice when the same literal string appears more than once in a compilation unit, and will share the same storage for it.许多编译器(例如, gcc )会注意到相同的文字字符串在编译单元中出现不止一次,并将为其共享相同的存储。 So if a program modifies one instance, it could affect others as well.因此,如果程序修改了一个实例,它也可能影响其他实例。

There is also never a need to do this, since the same intended effect can easily be achieved by using a static character array.也永远不需要这样做,因为使用静态字符数组可以轻松实现相同的预期效果。 For instance:例如:

#include <stdio.h>

int main(void) {
    static char st_arr[] = "aaa";
    char *st = st_arr;
    *st = 'b'; 
    return 0;
}

This does exactly what the posted code attempted to do, but without any undefined behavior.这正是发布的代码试图做的,但没有任何未定义的行为。 It also takes the same amount of memory.它也需要相同数量的内存。 In this example, the string "aaa" is used as an array initializer, and does not have any storage of its own.在此示例中,字符串"aaa"用作数组初始值设定项,并且没有任何自己的存储空间。 The array st_arr takes the place of the constant string from the original example, but (1) it will not be placed in read-only memory, and (2) it will not be shared with any other references to the string.数组st_arr代替了原始示例中的常量字符串,但是 (1) 它不会被放置在只读内存中,并且 (2) 它不会与对该字符串的任何其他引用共享。 So it's safe to modify it, if in fact that's what you want.所以修改它是安全的,如果实际上这是你想要的。

Is there any other compiler that writes to the pointer and does not take any memory breach failure using the processor in protected mode?是否有任何其他编译器写入指针并且在保护模式下使用处理器不会发生任何内存破坏故障?

How can some GCC compilers modify a constant char pointer? 一些 GCC 编译器如何修改常量字符指针?

GCC 3 and earlier used to support gcc -fwriteable-strings to let you compile old K&R C where this was apparently legal, according to https://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Incompatibilities.html .根据https://gcc.gnu.org/onlinedocs/gcc-3.3.6/gcc/Incompatibilities.html,GCC 3 及更早版本曾经支持gcc -fwriteable-strings以让您编译旧的 K&R C,这显然是合法的. (It's undefined behaviour in ISO C and thus a bug in an ISO C program). (这是 ISO C 中未定义的行为,因此是 ISO C 程序中的错误)。 That option will define the behaviour of the assignment which ISO C leaves undefined.该选项将定义 ISO C 未定义的分配行为。

GCC 3.3.6 manual - C Dialect options GCC 3.3.6 手册 - C 方言选项

-fwritable-strings
Store string constants in the writable data segment and don't uniquize them.将字符串常量存储在可写数据段中,并且不要对其进行唯一化。 This is for compatibility with old programs which assume they can write into string constants.这是为了与假设它们可以写入字符串常量的旧程序兼容。

Writing into string constants is a very bad idea;写入字符串常量是一个非常糟糕的主意; “constants” should be constant. “常数”应该是常数。

GCC 4.0 removed that option ( release notes ); GCC 4.0 删除了该选项(发行说明); the last GCC3 series was gcc3.4.6 in March 2006. Although apparently it had become buggy in that version.最后一个 GCC3 系列是 2006 年 3 月的 gcc3.4.6。虽然显然在那个版本中变得有问题

gcc -fwritable-strings would treat string literals like non-const anonymous character arrays (see @gnasher's answer), so they go in the .data section instead of .rodata , and thus get linked into a segment of the executable that's mapped to read+write pages, not read-only. gcc -fwritable-strings会将字符串文字视为非常量匿名字符数组(请参阅@gnasher 的回答),因此它们进入.data部分而不是.rodata ,从而链接到映射到读取的可执行文件的一段+写页面,不是只读的。 (Executable segments have basically nothing to do with x86 segmentation, it's just a start+range memory-mapping from the executable file to memory.) (可执行段基本上与 x86 分段无关,它只是从可执行文件到内存的开始+范围内存映射。)

And it would disable duplicate-string merging, so char *foo() { return "hello"; }它会禁用重复字符串合并,所以char *foo() { return "hello"; } char *foo() { return "hello"; } and char *bar() { return "hello"; } char *foo() { return "hello"; }char *bar() { return "hello"; } char *bar() { return "hello"; } would return different pointer values, instead of merging identical string literals. char *bar() { return "hello"; }将返回不同的指针值,而不是合并相同的字符串文字。


Related:有关的:


Linker option: still Undefined Behaviour so probably not viable链接器选项:仍然是未定义的行为,所以可能不可行

On GNU/Linux, linking with ld -N ( --omagic ) will make the text (as well as data) section read+write.在 GNU/Linux 上,使用ld -N ( --omagic ) 链接将使文本(以及数据)部分读+写。 This may apply to .rodata even though modern GNU Binutils ld puts .rodata in its own section (normally with read but not exec permission) instead of making it part of .text .这可能适用于.rodata即使现代 GNU Binutils ld.rodata放在它自己的部分(通常具有 read 但没有exec 权限)而不是使其成为.text一部分。 Having .text writeable could easily be a security problem: you never want a page with write+exec at the same time, otherwise some bugs like buffer overflows can turn into code-injection attacks. .text可写很容易成为一个安全问题:你永远不希望一个页面同时具有 write+exec,否则一些 bug 像缓冲区溢出可能会变成代码注入攻击。

To do this from gcc, use gcc -Wl,-N to pass on that option to ld when linking.要从 gcc 执行此操作,请在链接时使用gcc -Wl,-N将该选项传递给 ld。

This doesn't do anything about it being Undefined Behaviour to write const objects.这对编写const对象的未定义行为没有任何作用。 eg the compiler will still merge duplicate strings, so writing into one char *foo = "hello";例如编译器仍然会合并重复的字符串,所以写入一个char *foo = "hello"; will affect all other uses of "hello" in the whole program, even across files.将影响整个程序中"hello"所有其他使用,甚至跨文件。

If you want something writeable, use static char foo[] = "hello";如果你想要一些可写的东西,使用static char foo[] = "hello"; where the quoted string is just an array initializer for a non-const array.其中带引号的字符串只是非常量数组的数组初始值设定项。 As a bonus, this is more efficient than static char *foo = "hello";作为奖励,这比static char *foo = "hello";更有效static char *foo = "hello"; at global scope, because there's one fewer level of indirection to get to the data: it's just an array instead a pointer stored in memory.在全局范围内,因为获取数据的间接级别少了一层:它只是一个数组,而不是存储在内存中的指针。

Your literal "aaa" produces a static array of four const char 'a', 'a', 'a', '\\0' in an anonymous location and returns a pointer to the first 'a', cast to char*.您的文字“aaa”在匿名位置生成一个包含四个 const char 'a', 'a', 'a', '\\0' 的静态数组,并返回指向第一个 'a' 的指针,转换为 char*。

Trying to modify any of the four characters is undefined behaviour.尝试修改四个字符中的任何一个都是未定义的行为。 Undefined behaviour can do anything, from modifying the char as intended, pretending to modify the char, doing nothing, or crashing.未定义的行为可以做任何事情,从按预期修改字符、假装修改字符、什么都不做或崩溃。

It's basically the same as static const char anonymous[4] = { 'a', 'a', 'a', '\\0' };与 static const char anonymous[4] = { 'a', 'a', 'a', '\\0' }; 基本相同。 char* st = (char*) &anonymous [0]; char* st = (char*) &anonymous [0];

You are asking whether or not the platform may cause undefined behavior to be defined.您在问平台是否会导致未定义的行为被定义。 The answer to that question is yes.这个问题的答案是肯定的。

But you are also asking whether or not the platform defines this behavior.但是您也在询问平台是否定义了这种行为。 In fact it does not.事实上并非如此。

Under some optimization hints, the compiler will merge string constants, so that writing to one constant will write to the other uses of that constant.在一些优化提示下,编译器将合并字符串常量,因此写入一个常量将写入该常量的其他用途。 I used this compiler once, it was quite capable of merging strings.这个编译器我用过一次,它合并字符串的能力很强。

Don't write this code.不要写这个代码。 It's not good.这不好。 You will regret writing code in this style when you move onto a more modern platform.当您转向更现代的平台时,您会后悔以这种风格编写代码。

To add to the correct answers above, DOS runs in real mode, so there is no read only memory.为了补充上面的正确答案,DOS 以实模式运行,因此没有只读存储器。 All memory is flat and writable.所有的内存都是扁平且可写的。 Hence, writing to the literal was well defined (as it was in any sort of const variable) at the time.因此,写入文字在当时是明确定义的(就像在任何类型的 const 变量中一样)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM