简体   繁体   中英

Difference in length between string and character array in c?

I'm a beginner with C.

I'm taking a string from the user. Then I am storing the letters of the string, one by one, into a character array.

The code compiles fine.

However, when I do a strlen function on both of them, the result doesn't match.

if The length of the string, for example is 9.

then The length of the character array comes out to 11.

from what I understand, strings in C has a terminating null byte. So if anything, shouldn't strlen of a string be Longer than the character array made from the same string? what is causing the difference?

#include <stdio.h>
#include <string.h>
#include <ctype.h>
#include <math.h>
#include <stdlib.h>


int main ( int argc, string argv[])

{   
 string plaintext = get_string("Enter plaintext:  ");
    
    printf ("length of plaintext is : %lu", strlen(plaintext));
    
printf("\n");

    
    char newstring [(strlen(plaintext))] ; 
    
    for (int i = 0, m = strlen (plaintext); i < m ; i++)
    {
        newstring[i] = plaintext[i];
        
        printf("%c  ", plaintext[i]);
        
        printf("\n");
        
    }
    
printf("length of newstring is : %lu", strlen(newstring));

printf("\n");

printf("%s", newstring);

printf ("\n");

}

Thanks.

A string is a sequence of character values including a 0-valued terminator. The string "hello" is represented as the sequence {'h', 'e', 'l', 'l', 'o', 0} . C does not have a true string type - the string typedef provided by the CS50 and similar libraries is an alias for the type char * , which is not a string . Due to the decay rule, most of the time when we're dealing with strings we're dealing with expressions of type char * , but those expressions are not strings in and of themselves.

Strings are stored in arrays of character type ( char for ASCII, EBCDIC, UTF-8, etc., or wchar_t for "wide" encodings). To store an N-character string, the array must be at least N+1 elements wide to account for the terminator. The problem in your code is that you are not properly terminating newstring , so strlen goes past the end of it and keeps counting until it sees a 0-valued byte. You must define newstring as

char newstring[ strlen( plaintext ) + 1 ];

In your loop, you must iterate from 0 to m+1 to copy the string terminator from plaintext to newstring . Alternately, you could just use strcpy .

strlen gives you the number of characters in the string up to but not including the terminator - strlen( "hello" ) will return 5, but the array that stores "hello" must be at least 6 elements wide. That calculation gets a bit more complicated for variable-length encodings like UTF-8, where a single character may require more than one byte to store. I have no experience dealing with extended UTF-8 characters in C, so I won't say more about it except that it's something to be aware of.

Another thing to be aware of is that arrays in C don't automatically grow or shrink as you write to them - an array's size is fixed throughout its lifetime, so if you define an array to store 10 characters and you try to write 100 characters to it, those extra 90 characters will get written to the memory following the array, which may result in corrupted data, a runtime error, or something else. C doesn't do bounds checking on array access, so it's up to you to make sure you don't overflow a buffer (this is true for all array types, not just character arrays).

It is very important to understand that C does not have strings.

There is no type string in the language.

What C considers to be "strings" are actually always arrays of type char that are expected to be null-terminated, meaning that there will be a '\0' char somewhere in the array.

All functions that operate on strings, such as 'strlen', 'strcpy' and 'strcmp', rely on this null character to know when to stop scanning the array.

The reason for this is that C arrays do not contain information about their size.

So, if you do not have the null character, any str function will run until it finds one or tries to access memory your program is not allowed to access.

This is why some of these functions are considered unsafe - they can result in undefined behavior and allow exploits.

Another important thing to know is that C does not initialize local variables, so in your code newstring will have unpredictable values.

That means unless you put a null character in there explicitly, it won't have one!

char arrays have a terminating character "\0"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM