简体   繁体   English

strlen()如何计算当前函数中未定义的字符串的长度?

[英]How does strlen() calculate the length of a string not defined in the current function?

From what I understand, in C arrays and pointers to strings do not implicitly contain information regarding the amount of elements in the respective array. 据我了解,在C数组和指向字符串的指针中没有隐式包含有关相应数组中元素数量的信息。 This is why argc must be passed to main() in addition to argv . 这就是为什么除了argv之外,还必须将argc传递给main() However, on my machine the following code does correctly print the length of the string "Hello" which is 5 even though the string is not defined in the same scope as printf() . 但是,在我的机器上,即使未在与printf()相同的作用域中定义字符串,以下代码也可以正确打印字符串"Hello"的长度,即5

#include <stdio.h>
#include <string.h>

void p1(char *a)
{
    printf("%lu\n", strlen(a));
}

void p2(char a[])
{
    printf("%lu\n", strlen(a));
}

int main(int argc, char *argv[])
{
    char *a = "Hello";
    char b[] = "Hello";
    char c[] = {'H', 'e', 'l', 'l', 'o', '\0'};
    char d[] = {'H', 'e', 'l', 'l', 'o'};

    p1(a);
    p2(a);

    printf("\n");

    p1(b);
    p2(b);

    printf("\n");

    p1(c);
    p2(c);

    printf("\n");

    p1(d);
    p2(d);

    return 0;
}

Here are my results and machine type: 这是我的结果和机器类型:

$ ./a.out $ ./每年
5 5
5 5

5 5
5 5

5 5
5 5

6 6
6 6

$ uname -a $ uname -a
Linux loathe 3.16.0-37-generic #51-Ubuntu SMP Tue May 5 13:45:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux Linux厌恶3.16.0-37-generic#51-Ubuntu SMP Tue May 5 5 13:45:59 UTC 2015 x86_64 x86_64 x86_64 GNU / Linux

and

$ cat /etc/issue $ cat / etc / issue
Ubuntu 14.10 \\n \\l Ubuntu 14.10 \\ n \\ l

How is strlen() determining the string length? strlen()如何确定字符串长度? Why does this work even when the string is explicitly sent as a pointer, as in the function call function p1(a) ? 即使将字符串显式发送为指针(例如在函数调用函数p1(a)为什么有效? If this behaviour is only in newer versions of gcc and clang , when did it start and can I rely on it in my programs? 如果仅在较新版本的gccclang此行为,它何时开始运行,我可以在程序中依赖它吗?


EDIT: 编辑:

Additionally, why does strlen() report an additional character when the null byte is not including in the string? 另外,当字符串中不包含空字节时,为什么strlen()报告一个附加字符? (added in edit) (在编辑中添加)

Edit: I've addressed the assertion that strlen() works by finding a null byte. 编辑:我通过找到一个空字节解决了strlen()工作的断言。

How is strlen() determining the string length? strlen()如何确定字符串长度?

strlen() searches for the terminating null ( '\\0' ), and counts the characters (ie, number of bytes), from the beginning (pointer), till the null , excluding the null itself. strlen()搜索终止的null'\\0' ),并从头(指针)开始计算字符(即字节数),直到null为止,但不包括null本身。

Remember, C -style strings are by definition null-terminated. 请记住, C风格的字符串定义为以空值结尾。

Just as a note, as the return type of strlen() is size_t , %zu format specifier should be used to print the return value. 请注意,由于strlen()的返回类型为size_t ,因此应使用%zu格式说明符来打印返回值。


EDIT: 编辑:

If a char array is not null-terminated, that is not qualified to be called as a string . 如果char数组不是以null结尾的, 则不符合被称为string的条件 Using that kind of array in any string related library function will most certainly invoke undefined behaviour . 在任何与字符串相关的库函数中使用这种数组,无疑会调用未定义的行为

For more info, check the man page for strlen() . 有关更多信息,请检查手册页中的strlen()

Just read the standard. 只需阅读标准。 It clearly defines what strlen does . 它清楚地定义了strlen 工作 It is your assertion which is wrong. 您的主张是错误的。

Your char array d invokes undefined behaviour (last sentence) when used with the string functions. 当与字符串函数一起使用时,您的char数组 d调用未定义的行为 (最后一句话)。

Additionally, why does strlen() report an additional character when the null byte is not including in the string? 另外,当字符串中不包含空字节时,为什么strlen()报告一个附加字符? (added in edit) (在编辑中添加)

To answer your edit explicitely: Your assumption is wrong. 明确回答您的修改:您的假设是错误的。 It could return 42 , crash or even delete your files. 它可能返回42 ,崩溃甚至删除您的文件。 Read about Undefined Behavior, as suggested by previous comments. 阅读有关未定义行为的信息,如先前的评论所建议。

The technical reason for it reporting 6 is, by accident, in your final program, there is a 0 byte 2 bytes behind your array. 报告6的技术原因是偶然的,在您的最终程序中,数组后面有2字节的0字节。 That doesn't make the array a valid C string. 但这不能使数组成为有效的C字符串。

First of all, your functions p1 and p2 are exactly identical. 首先,您的函数p1p2完全相同。 It is not the case that one of them is treating the string as a pointer, and the other one is treating the string as an array. 并非不是其中一个将字符串视为指针,而另一个将字符串视为数组。 A string is, by definition, an array of characters. 根据定义,字符串是字符数组。 But , whenever we manipulate arrays (and, in particular, when we pass an array to a function) what we're actually using is a pointer to the array's first element. 但是 ,每当我们操作数组时(尤其是将数组传递给函数时),我们实际使用的是指向数组第一个元素的指针。 So your function p2 , even though it looks like it's receiving an array, is actually receiving a pointer. 因此,即使您的函数p2看起来正在接收数组,它实际上仍在接收指针。 See also this question in the C FAQ list . 另请参阅C常见问题解答列表中的此问题

Now, secondly, a C string does contain, explicitly, an indication of its length: all proper strings contain a null terminator, the special character '\\0' , which marks its end. 现在,第二,C字符串确实包含其长度的指示:所有适当的字符串都包含一个空终止符,即特殊字符'\\0' ,用于标记其结尾。 How you determine a string's length is to read through it, counting characters, until you find the '\\0' . 确定字符串长度的方法是通读它,对字符进行计数,直到找到'\\0'为止。 That's exactly what strlen does. 这正是strlen所做的。

Finally, we come to your array d . 最后,我们来到您的数组d As I think you know, you have arranged that this array look sort of like the string "Hello" , but without the terminating null character. 如我所知,您已经安排好该数组看起来像字符串"Hello" ,但没有终止的空字符。 So when strlen tries to compute the length of d , it's going to sail right past the end, and start inspecting whatever random area of memory exist past the end of the array. 因此,当strlen尝试计算d的长度时,它将一直超出末尾,并开始检查数组末尾是否存在任何随机的内存区域。 There are three things that might happen: 可能会发生三件事:

  • The next thing in memory right after the array might just happen to be a 0 byte. 紧接该数组之后,内存中的下一件事可能恰好是一个0字节。 In this case, by pure chance, strlen will just happen to compute the correct answer, 5. 在这种情况下,很偶然地, strlen恰好会计算出正确的答案5。
  • It's more likely that there are some other random bytes in memory following the array, and that somewhere within them is a 0 byte. 数组后面的内存中可能还有其他一些随机字节,并且其中某些位置是0字节。 In that case, strrlen will compute an answer greater -- perhaps much greater -- than 5. That's what appears to have happened. 在这种情况下, strrlen计算出的答案可能大于5,甚至可能大于5。这就是事实。
  • Before finding a 0 byte, strlen might run past the end of the memory allocated to your process, in which case you will get some operating-system-level error message, such as a Segmentation violation or Bus error or "This program has terminated unexpectedly" or Blue Screen Of Death. 找到0字节之前, strlen可能会运行到分配给您的进程的内存末尾,在这种情况下,您会收到一些操作系统级别的错误消息,例如分段冲突或总线错误,或者“此程序意外终止”或“死亡蓝屏”。

An implementation of strlen strlen的实现

size_t strlen(const char * str) {
    const char * s;
    for (s = str; *s; ++s);
    return(s - str);
}

The loop termination condition *s is internally *s != '\\0' . 循环终止条件*s在内部为*s != '\\0' So, it loops until terminating null character is found. 因此,它一直循环直到找到终止null字符。 And, it behaves the same for both a pointer to char and a char array . 并且, pointer to charchar array pointer to char行为相同。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM