简体   繁体   中英

C++ and C-string variables, char array size

I've searched through many articles on here and have self-tested this concept with my own code. My question is to satisfy my own curiosity and maybe help others as I cannot find a answer describing this concept in particular. My textbook (teaching C++) describes a C-string variable:

A C-string variable is an array of characters. The following array declaration provides a C-string variable "s" capable of storing a C-string value with nine or fewer characters:

 char s[10];

The 10 is for the nine letters in the string plus the null character '\\0' to mark the end of the string. Like any other partially filled array, a C-string variable uses positions starting at indexed variable 0 through as many as are needed.

I'm trying to understand the above. If the array size is 10, wouldn't the total storage size be 11? ie 0-10 = 11 spaces. If the \\0 character occupies one space, then we'd still be able to store 10 characters and not 9 as per the book.

In my own testing, I declared a character array test[4] and stored the word "cat" in the array. When looking at individual positions within the array, I can see individual characters at each index ie:

test[0] = c
test[1] = a
test[2] = t
test[3] =  
test[4] =  

Why do we need 2 additional slots in the character array and not 1?

An array with size N has indexes starting at 0 and ending with N-1. It does not have an element with index N.

With your example of char test[4] , the array has indexes 0, 1, 2, and 3. Attempting to access index 4 is going off the end of the array. C and C++ do no prevent you from doing so, and attempting to do so invokes undefined behavior .

You can look at an array as a group of variables of the same type and size which are consecutive in memory one next to the other.

Arrays are indexed from 0 as the first element to the n - 1 as the last element. So you can access any element just using an index.

Trying to access an array with an index i >= n or a negative index i < 0 Will issue in undefined behavior .

Arrays of characters need to set the last element as a NULL character \\0 .

Here is an example:

char c[5] = "Hello"; // Error

Above c has 5 elements and \\0 so it is 6 Byte long. So to correct it:

char c[6] = "Hello"; //  Null character added automatically
// char c[] = "Hello";

Look at this example:

char text[6] = {'H', 'e', 'l', 'l', 'o', '\0'};

Above in a such initialization you must add the null terminator character '\\0' otherwise you'll get a garbage characters at the end of your string.

std::cout << text[0]; // H which is the first element
std::cout << text[6 - 1 - 1]; // o which is the last character in the array.
  • arrays of other types other than characters need not to add a null terminator and the number of elements is n but indexing is the same 0 through n - 1 ;

     int array[5] = {4, 5, 9, 22, 16}; std::cout << array[0]; // 4 std::cout << array[5 - 1]; // 16

I think your problem comes from the misconception of how you access the memory.

s[n] refers at accessing the value pointed by the pointer s plus n blocks in memory, it can also be written *(s + n)

So basically, declaring

char s[4];

and setting cat in it, you will get this layout in memory

     +---+
  s: | c | s[0] also (s + 0)
     +---+ 
     | a | s[1] also (s + 1)
     +---+
     | t | s[2] also (s + 2)
     +---+
     |\0 | s[3] also (s + 3)
     +---+
     | ? | s[4] also (s + 4)
     +---+
     | ? | s[5] also (s + 5)
     +---+
     | ? | s[6] also (s + 6)
     +---+
     | ? | s[7] also (s + 7)
     +---+
     | ? | s[8] also (s + 8)
     +---+
     | ? | s[9] also (s + 9)
     +---+

The ? stand for a variable which we aren't sure of the value.

You CAN access it, you can even modify it sometimes. But the behavior of this isn't clear and can be undefined .

For your example s[4] can change each time the executable is executed.

char s[10]; had valid indices from 0 to 9 and not 0 to 10 as you said.

A C-string variable is an array of characters.

This is slightly misleading. AC string is a sequence of character values followed by a 0-valued terminator. For example, the string "Hello" is represented as the sequence {'H','e', 'l', 'l', 'o', 0} . The presence of the 0 terminator makes the character sequence a string . All C string handling functions ( strcat , strcmp , strcpy , strchr , etc.) assume the presence of that terminator; if the terminator isn't there, those routines will not function properly.

Strings are stored in arrays of character type ( char for ASCII, EBCDIC, or UTF-8 strings or wchar_t for "wide" strings 1 ). Multiple strings may be stored in a single array if there's sufficient space. A 10-element array may store a single 9-character string, or two 4 character strings, or 5 one-character strings. Remember that to store an N-character string you need N+1 array elements to account for the terminator.

I'm trying to understand the above. If the array size is 10, wouldn't the total storage size be 11? ie 0-10 = 11 spaces.

Total storage size is 10, but array elements are indexed from 0 to 9. Given the declaration

char foo[10];

you get the following layout in memory:

     +---+
foo: |   | foo[0]
     +---+ 
     |   | foo[1]
     +---+
     |   | foo[2]
     +---+
     |   | foo[3]
     +---+
     |   | foo[4]
     +---+
     |   | foo[5]
     +---+
     |   | foo[6]
     +---+
     |   | foo[7]
     +---+
     |   | foo[8]
     +---+
     |   | foo[9]
     +---+

For any N-element array, individual elements are indexed from 0 through N-1.

Remember that in C, the array subscript operation a[i] is defined as *(a + i) - given a starting address a 2 , offset i elements (not bytes!!!) from that address and dereference the result. The first element is stored at a , the second element is stored at a + 1 , the third at a + 2 , etc.


  1. wchar_t was introduced to represent character sets outside the ranges defined by ASCII or EBCDIC, so it's wider than the `char` type (often the width of two `char`s). With the advent of schemes like UTF-8 to represent non-English character sets, it's not that useful and I don't see it used very often.
  2. At some point, you're going to hear someone say "an array is just a pointer". This is not correct. Under most circumstances, an expression of array type will be converted ("decay") to an expression of pointer type, and the value of the expression will be the address of the first element in the array. The array object itself is not a pointer, nor does it set aside any space for a pointer value.

char test[4] is the definition of your array, it means your array's size is four, but you can only use the spaces: test[0], test[1], test[2], test[3] .

You can go to http://www.cplusplus.com/doc/tutorial/arrays/ to learn more about Arrays of c++.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM