简体   繁体   English

为什么C字符串文字的最大长度与max char []不同?

[英]Why is max length of C string literal different from max char[]?

Clarification : Given that a string literal can be rewritten as a const char[] (see below), imposing a lower max length on literals than on char[] s is just a syntactic inconvenience. 澄清 :鉴于字符串文字可以重写为const char[] (见下文),在文字上施加比char[]更低的最大长度只是语法上的不便。 Why does the C standard encourage this? 为什么C标准鼓励这个?


The C89 standard has a translation limit for string literals: C89标准对字符串文字有翻译限制:

509 characters in a character string literal or wide string literal (after concatenation) 字符串文字或宽字符串文字中的509个字符(连接后)

There isn't a limit for a char arrays; char数组没有限制; perhaps 也许

32767 bytes in an object (in a hosted environment only) 对象中的32767个字节(仅限托管环境中)

applies (I'm not sure what object or hosted environment means), but at any rate it's a much higher limit. 适用(我不确定什么对象或托管环境意味着什么),但无论如何它是一个更高的限制。

My understanding is that a string literal is equivalent to char array containing characters, ie: it's always possible to rewrite something like this: 我的理解是字符串文字等同于包含字符的char数组,即:它总是可以重写这样的东西:

const char* str = "foo";

into this 进入这个

static const char __THE_LITERAL[] = { 'f', 'o', 'o', '\0' };
const char* str = __THE_LITERAL;

So why such a hard limit on literals? 那么为什么文字如此严格限制呢?

The limit on string literals is a compile-time requirement; 字符串文字的限制是编译时要求; there's a similar limit on the length of a logical source line. 对逻辑源行的长度有类似的限制。 A compiler might use a fixed-size data structure to hold source lines and string literals. 编译器可能使用固定大小的数据结构来保存源行和字符串文字。

(C99 increases these particular limits from 509 to 4095 characters.) (C99将这些特定限制从509个字符增加到4095个字符。)

On the other hand, an object (such as an array of char ) can be built at run time. 另一方面,可以在运行时构建对象(例如char数组)。 The limits are likely imposed by the target machine architecture, not by the design of the compiler. 限制可能是由目标机器架构强加的,而不是由编译器的设计强加的。

Note that these are not upper bounds imposed on programs. 请注意,这些不是对程序施加的上限。 A compiler is not required to impose any finite limits at all. 编译器不需要施加任何有限的限制。 If a compiler does impose a limit on line length, it must be at least 509 or 4095 characters. 如果编译器对行长度施加限制,则它必须至少为509或4095个字符。 (Most actual compilers, I think, don't impose fixed limits; rather they allocate memory dynamically.) (我认为,大多数实际编译器都不会施加固定限制;而是动态分配内存。)

It's not that 509 characters is the limit for a string, it's the minimum required for ANSI compatibility, as explained here . 这并不是说509个字符是一个字符串的限制,它是ANSI兼容所需的最低限度,为解释在这里

I think that the makers of the standard pulled the number 509 out of their ass, but unless we get some official documentation from this, there is no way for us to know. 我认为该标准的制造商将数字509从他们的屁股中拉出来,但除非我们从中得到一些官方文件,否则我们无法知道。

As far as how many characters can actually be in a string literal, that is compiler-dependent. 至于字符串文字中实际可以包含多少个字符,这取决于编译器。

Here are some examples: 这里有些例子:

  • MSVC: 2048 MSVC:2048
  • GCC: No Limit (up to 100,000 characters), but gives warning after 510 characters: GCC:无限制(最多100,000个字符),但在510个字符后发出警告:

    String literal of length 100000 exceeds maximum length 509 that C90 compilers are required to support 长度为100000的字符串文字超过了C90编译器需要支持的最大长度509

Sorry about the late answer, but I'd like to illustrate the difference between the two cases (Richard J. Ross already pointed out that they're not equivalent.) 对于迟到的答案感到抱歉,但我想说明这两种情况之间的区别(Richard J. Ross已经指出它们并不等同。)

Suppose you try this: 假设你试试这个:

const char __THE_LITERAL[] = { 'f', 'o', 'o', '\0' };
const char* str = __THE_LITERAL;
char *str_writable = (char *) str;  // Not so const anymore
str_writable[0] = 'g';

Now str contains "goo". 现在str包含“goo”。

But if you do this: 但是如果你这样做:

const char* str = "foo";
char *str_writable = (char *) str;
str_writable[0] = 'g';

Result: segfault! 结果:段错! (on my platform, at least.) (至少在我的平台上。)

Here is the fundamental difference: In the first case you have an array which is initialized to "foo", but in the second case you have an actual string literal. 这是根本区别:在第一种情况下,您有一个初始化为“foo”的数组,但在第二种情况下,您有一个实际的字符串文字。

On a side note, 在旁注,

const char __THE_LITERAL[] = { 'f', 'o', 'o', '\0' };

is exactly equivalent to 完全等同于

const char __THE_LITERAL[] = "foo";

Here the = acts as an array initializer rather than as assignment. 这里=充当数组初始化器而不是赋值。 This is very different from 这是非常不同的

const char *str = "foo";

where the address of the string literal is assigned to str . 其中字符串文字的地址分配给str

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM