简体   繁体   中英

How does strlen() calculate the length of a string not defined in the current function?

From what I understand, in C arrays and pointers to strings do not implicitly contain information regarding the amount of elements in the respective array. This is why argc must be passed to main() in addition to argv . However, on my machine the following code does correctly print the length of the string "Hello" which is 5 even though the string is not defined in the same scope as printf() .

#include <stdio.h>
#include <string.h>

void p1(char *a)
{
    printf("%lu\n", strlen(a));
}

void p2(char a[])
{
    printf("%lu\n", strlen(a));
}

int main(int argc, char *argv[])
{
    char *a = "Hello";
    char b[] = "Hello";
    char c[] = {'H', 'e', 'l', 'l', 'o', '\0'};
    char d[] = {'H', 'e', 'l', 'l', 'o'};

    p1(a);
    p2(a);

    printf("\n");

    p1(b);
    p2(b);

    printf("\n");

    p1(c);
    p2(c);

    printf("\n");

    p1(d);
    p2(d);

    return 0;
}

Here are my results and machine type:

$ ./a.out
5
5

5
5

5
5

6
6

$ uname -a
Linux loathe 3.16.0-37-generic #51-Ubuntu SMP Tue May 5 13:45:59 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux

and

$ cat /etc/issue
Ubuntu 14.10 \\n \\l

How is strlen() determining the string length? Why does this work even when the string is explicitly sent as a pointer, as in the function call function p1(a) ? If this behaviour is only in newer versions of gcc and clang , when did it start and can I rely on it in my programs?


EDIT:

Additionally, why does strlen() report an additional character when the null byte is not including in the string? (added in edit)

Edit: I've addressed the assertion that strlen() works by finding a null byte.

How is strlen() determining the string length?

strlen() searches for the terminating null ( '\\0' ), and counts the characters (ie, number of bytes), from the beginning (pointer), till the null , excluding the null itself.

Remember, C -style strings are by definition null-terminated.

Just as a note, as the return type of strlen() is size_t , %zu format specifier should be used to print the return value.


EDIT:

If a char array is not null-terminated, that is not qualified to be called as a string . Using that kind of array in any string related library function will most certainly invoke undefined behaviour .

For more info, check the man page for strlen() .

Just read the standard. It clearly defines what strlen does . It is your assertion which is wrong.

Your char array d invokes undefined behaviour (last sentence) when used with the string functions.

Additionally, why does strlen() report an additional character when the null byte is not including in the string? (added in edit)

To answer your edit explicitely: Your assumption is wrong. It could return 42 , crash or even delete your files. Read about Undefined Behavior, as suggested by previous comments.

The technical reason for it reporting 6 is, by accident, in your final program, there is a 0 byte 2 bytes behind your array. That doesn't make the array a valid C string.

First of all, your functions p1 and p2 are exactly identical. It is not the case that one of them is treating the string as a pointer, and the other one is treating the string as an array. A string is, by definition, an array of characters. But , whenever we manipulate arrays (and, in particular, when we pass an array to a function) what we're actually using is a pointer to the array's first element. So your function p2 , even though it looks like it's receiving an array, is actually receiving a pointer. See also this question in the C FAQ list .

Now, secondly, a C string does contain, explicitly, an indication of its length: all proper strings contain a null terminator, the special character '\\0' , which marks its end. How you determine a string's length is to read through it, counting characters, until you find the '\\0' . That's exactly what strlen does.

Finally, we come to your array d . As I think you know, you have arranged that this array look sort of like the string "Hello" , but without the terminating null character. So when strlen tries to compute the length of d , it's going to sail right past the end, and start inspecting whatever random area of memory exist past the end of the array. There are three things that might happen:

  • The next thing in memory right after the array might just happen to be a 0 byte. In this case, by pure chance, strlen will just happen to compute the correct answer, 5.
  • It's more likely that there are some other random bytes in memory following the array, and that somewhere within them is a 0 byte. In that case, strrlen will compute an answer greater -- perhaps much greater -- than 5. That's what appears to have happened.
  • Before finding a 0 byte, strlen might run past the end of the memory allocated to your process, in which case you will get some operating-system-level error message, such as a Segmentation violation or Bus error or "This program has terminated unexpectedly" or Blue Screen Of Death.

An implementation of strlen

size_t strlen(const char * str) {
    const char * s;
    for (s = str; *s; ++s);
    return(s - str);
}

The loop termination condition *s is internally *s != '\\0' . So, it loops until terminating null character is found. And, it behaves the same for both a pointer to char and a char array .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM