简体   繁体   English

以下C char数组存储实现背后的原因是什么?

[英]What is the reason behind the following C char array storage implementation?

What is the implementation reason behind the following char array implementation? 以下char数组实现背后的实现原因是什么?

char *ch1 = "Hello"; // Read-only data
/* if we try ch1[1] = ch1[2]; 
we will get **Seg fault** since the value is stored in 
the constant code segment */

char ch2[] = "World"; // Read-write data
/* if we try ch2[1] = ch2[2]; will work. */

According to the book Head first C (page 73,74), the ch2[] array is stored both in constant code segment but also in the function stack. 根据Head first C (第73,74页)一书, ch2[]数组既存储在常量代码段中,也存储在函数堆栈中。 What is the reason behind duplicating both in code and stack memory space? 在代码和堆栈内存空间中复制的原因是什么? Why the value can be kept only in stack if it is not read-only data? 如果不是只读数据,为什么值只能保存在堆栈中?

First, let's clear something up. 首先,让我们澄清一些事情。 String literals are not necessarily read-only data, it's just that it's undefined behaviour to try and change them. 字符串字面不一定只读数据,它只是它是不确定的行为,试图改变它们。

It doesn't necessarily have to crash, it may work just fine. 不一定要崩溃,它可能工作得很好。 But, being undefined behaviour, you shouldn't rely on it if you want you code to run in another implementation, another version of the same implementation, or even next Wednesday. 但是,如果您希望代码在另一个实现,同一实现的另一个版本或甚至下周三运行,则不应该依赖它。

This may well stem from a time before standards were in place (the original ANSI/ISO mandate was to codify existing practice rather than create a new language). 这可能源于标准制定之前的时间(最初的ANSI / ISO授权是编纂现有实践而不是创建新语言)。 In many implementations, strings would share space for efficiency, such as the code: 在许多实现中,字符串将共享空间以提高效率,例如代码:

char *good = "successful";
char *bad = "unsuccessful";

resulting in: 导致:

good---------+
bad--+       |
     |       |
     V       V
   | u | n | s | u | c | c | e | s | s | f | u | l | \0 |

Hence, if you changed one of the characters in good , it would also change bad . 因此,如果你改变了其中一个角色good ,这也将改变bad

The reason you can do it with something like: 您可以使用以下内容执行此操作的原因:

char indifferent[] = "meh";

is that, while good and bad point to a string literal, that statement actually creates a character array big enough to hold "meh" and then copies the data into it 1 . 是,虽然goodbad指向一个字符串,该声明实际上创建了一个字符数组大到足以容纳"meh" ,然后将数据复制到它1。 The copy of the data can be freely changed. 可以自由更改数据的副本。

In fact the C99 rationale document explicitly cites this as one of the reasons: 事实上,C99基本原理文件明确指出这是其中一个原因:

String literals are not required to be modifiable. 字符串文字不需要是可修改的。 This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and to perform certain optimizations. 此规范允许实现共享具有相同文本的字符串副本,将字符串文字放在只读内存中,并执行某些优化。

But regardless as to why, the standard is quite clear on the what. 但无论为什么,标准都很清楚 From C11 6.4.5 String literals : 来自C11 6.4.5 String literals

7/ It is unspecified whether these arrays are distinct provided their elements have the appropriate values. 7 /如果这些数组的元素具有适当的值,则这些数组是否不同是未指定的。 If the program attempts to modify such an array, the behavior is undefined. 如果程序试图修改此类数组,则行为未定义。

For the latter case, this is covered in 6.7.6 Declarators and 6.7.9 Initialisation . 对于后一种情况,这在6.7.6 Declarators6.7.9 Initialisation有所涉及。


1 Though it's worth noting the the normal "as if" rules apply here (as long as an implementation acts as if it's following the standard, it can do what it pleases). 1虽然值得注意的是正常的“似乎”规则适用于此(只要实现的行为就像它遵循标准一样,它可以做它喜欢的事情)。

In other words, if the implementation can detect that you never try to change the data, it can quite happily bypass the copy and use the original. 换句话说,如果实现可以检测到您从未尝试更改数据,则可以非常愉快地绕过副本并使用原始数据。

We will get Seg fault since the value is stored in the constant code segment 我们将得到Seg故障,因为值存储在常量代码段中

This is false: your program crashes because it receives a signal indicating a segment violation ( SIGSEGV ) which, by default, causes the program to terminate. 这是错误的:您的程序崩溃是因为它收到一个指示段违规的信号( SIGSEGV ),默认情况下会导致程序终止。 But this is not the primary reason. 但这不是主要原因。 Modifying a string literal is undefined behavior , whether it's stored in read-only segments or not, which is much wider than you think. 修改字符串文字是未定义的行为 ,无论它是否存储在只读段中,这比您想象的要广泛得多。

array is stored both in constant code segment but also in the function stack. 数组既存储在常量代码段中,也存储在函数堆栈中。

This is an implementation detail and shouldn't concern you: as far as ISO C is concerned, those statements make no sense. 这是一个实现细节,不应该与您有关:就ISO C而言,这些语句没有意义。 This also means it could be implemented differently. 这也意味着它可以以不同方式实施。

When you 当你

 char ch2[] = "World";

"World" , which is a string literal, is copied into ch2 , something you would end up doing if you used malloc and pointers. "World" ,一个字符串文字,被复制到ch2 ,如果你使用malloc和指针,你最终会做的事情。 Now, why is that copied? 现在,为什么要复制?

One reason for this may be that it's something you would expect. 其中一个原因可能是你会期待的。 If you could modify such string literal, what if another part of the code referred to it and expected to have that value? 如果你可以修改这样的字符串文字,如果代码的另一部分引用它并期望具有该值,该怎么办? Having shared string literals is efficient because you can share them across your program and saves space. 拥有共享字符串文字是有效的,因为您可以在程序中共享它们并节省空间。

By copying it, you have your own copy of the string ( you "own" it) and you can modify it as you will. 通过复制它,你有自己的字符串副本(你“拥有”它),你可以随意修改它。

Quoting "Rationale for American National Standard for Information Systems Programming Language C" 引用“美国信息系统编程语言C国家标准的基本原理”

String literals are specied to be unmodiable. 字符串文字被指定为不可修改的。 This specication allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and perform certain optimizations. 此规范允许实现共享具有相同文本的字符串副本,将字符串文字放在只读内存中,并执行某些优化。 However, string literals do not have the type array of const char, in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. 但是,字符串文字没有const char的类型数组,以避免指针类型检查的问题,特别是对于库函数,因为将指向const char的指针指向一个指向char的普通指针是无效的。

This is only a partial answer with a counter-example to a claim that a string literal is stored in a read only memory: 对于声称字符串文字存储在只读存储器中的反例,这只是部分答案:

int main() {
   char a[]="World";
   printf("%s", a);
}

gcc -O6 -S cc gcc -O6 -S cc

.LC0:
    .string "%s"                  ;; String literal stored as expected
                                  ;; in read-only area within code
    ...
    movl    $1819438935, (%rsp)   ;; First four bytes in "worl"
    movw    $100, 4(%rsp)         ;; next to bytes in "d\0"
    call    printf
    ...

Here only the semantics of the concept literal is implemented; 这里只实现了概念文字的语义; the literal "world\\0" doesn't even exist. 文字“世界\\ 0”甚至不存在。

In practice only when the string literals are long enough, an optimizing compiler will choose to memcpy data from the literal pool to stack, requiring the existence of the literal as null terminating string. 实际上,只有当字符串文字足够长时,优化编译器才会选择将文本池中的数据memcpy到堆栈,要求将文字存在为空终止字符串。

The semantics of char *ch1 = "Hello"; char *ch1 = "Hello";的语义char *ch1 = "Hello"; OTOH requires that there exists a linear array somewhere, whose address can be assigned to the pointer ch1 . OTOH要求在某处存在线性阵列,其地址可以分配给指针ch1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 以下output背后的原因 - Reason behind the following output 此输出背后的原因是什么? - What is the reason behind this output? 在 C 中以下列方式从 integer 转换为 char 数组时有什么区别? - What are the differences when casting from integer to char array in following ways in C? 在 C 中出现“无效使用 void 表达式”错误的原因是什么? - what is the reason behind getting the "invalid use of void expression" error in C? 编译以下代码时出现警告的可能原因是什么 - What could be the possible reason behind the warning which comes up when the following piece of code is compiled C中的Char阵列旋转错误-无法看到原因 - Char array rotation error in C - can't see reason why 分段错误背后的原因是什么? - What is the reason behind the segmentation fault? 将空字符串文字初始化或分配给 C 中指向 char 的指针或 C++ 中指向 const char 的指针的原因是什么? - What is the reason to initialize or assign an empty string literal to a pointer to char in C, or a pointer to const char in C++? C中未初始化数组中char的默认值是多少? - What is the default value of a char in an uninitialized array, in C? *(&char_array)在C中是什么意思? - what does *(&char_array) mean in C?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM