I've searched through many articles on here and have self-tested this concept with my own code. My question is to satisfy my own curiosity and maybe help others as I cannot find a answer describing this concept in particular. My textbook (teaching C++) describes a C-string variable:
A C-string variable is an array of characters. The following array declaration provides a C-string variable "s" capable of storing a C-string value with nine or fewer characters:
char s[10];
The 10 is for the nine letters in the string plus the null character '\\0' to mark the end of the string. Like any other partially filled array, a C-string variable uses positions starting at indexed variable 0 through as many as are needed.
I'm trying to understand the above. If the array size is 10, wouldn't the total storage size be 11? ie 0-10 = 11 spaces. If the \\0 character occupies one space, then we'd still be able to store 10 characters and not 9 as per the book.
In my own testing, I declared a character array test[4]
and stored the word "cat" in the array. When looking at individual positions within the array, I can see individual characters at each index ie:
test[0] = c
test[1] = a
test[2] = t
test[3] =
test[4] =
Why do we need 2 additional slots in the character array and not 1?
An array with size N has indexes starting at 0 and ending with N-1. It does not have an element with index N.
With your example of char test[4]
, the array has indexes 0, 1, 2, and 3. Attempting to access index 4 is going off the end of the array. C and C++ do no prevent you from doing so, and attempting to do so invokes undefined behavior .
You can look at an array as a group of variables of the same type and size which are consecutive in memory one next to the other.
Arrays are indexed from 0
as the first element to the n - 1
as the last element. So you can access any element just using an index.
Trying to access an array with an index i >= n
or a negative index i < 0
Will issue in undefined behavior
.
Arrays of characters need to set the last element as a NULL character \\0
.
Here is an example:
char c[5] = "Hello"; // Error
Above c
has 5 elements and \\0
so it is 6 Byte long. So to correct it:
char c[6] = "Hello"; // Null character added automatically
// char c[] = "Hello";
Look at this example:
char text[6] = {'H', 'e', 'l', 'l', 'o', '\0'};
Above in a such initialization you must add the null terminator character '\\0' otherwise you'll get a garbage characters at the end of your string.
std::cout << text[0]; // H which is the first element
std::cout << text[6 - 1 - 1]; // o which is the last character in the array.
arrays of other types other than characters need not to add a null terminator and the number of elements is n
but indexing is the same 0
through n - 1
;
int array[5] = {4, 5, 9, 22, 16}; std::cout << array[0]; // 4 std::cout << array[5 - 1]; // 16
I think your problem comes from the misconception of how you access the memory.
s[n]
refers at accessing the value pointed by the pointer s
plus n blocks in memory, it can also be written *(s + n)
So basically, declaring
char s[4];
and setting cat
in it, you will get this layout in memory
+---+
s: | c | s[0] also (s + 0)
+---+
| a | s[1] also (s + 1)
+---+
| t | s[2] also (s + 2)
+---+
|\0 | s[3] also (s + 3)
+---+
| ? | s[4] also (s + 4)
+---+
| ? | s[5] also (s + 5)
+---+
| ? | s[6] also (s + 6)
+---+
| ? | s[7] also (s + 7)
+---+
| ? | s[8] also (s + 8)
+---+
| ? | s[9] also (s + 9)
+---+
The ?
stand for a variable which we aren't sure of the value.
You CAN access it, you can even modify it sometimes. But the behavior of this isn't clear and can be undefined .
For your example s[4]
can change each time the executable is executed.
char s[10];
had valid indices from 0 to 9 and not 0 to 10 as you said.
A C-string variable is an array of characters.
This is slightly misleading. AC string is a sequence of character values followed by a 0-valued terminator. For example, the string "Hello"
is represented as the sequence {'H','e', 'l', 'l', 'o', 0}
. The presence of the 0 terminator makes the character sequence a string . All C string handling functions ( strcat
, strcmp
, strcpy
, strchr
, etc.) assume the presence of that terminator; if the terminator isn't there, those routines will not function properly.
Strings are stored in arrays of character type ( char
for ASCII, EBCDIC, or UTF-8 strings or wchar_t
for "wide" strings 1 ). Multiple strings may be stored in a single array if there's sufficient space. A 10-element array may store a single 9-character string, or two 4 character strings, or 5 one-character strings. Remember that to store an N-character string you need N+1 array elements to account for the terminator.
I'm trying to understand the above. If the array size is 10, wouldn't the total storage size be 11? ie 0-10 = 11 spaces.
Total storage size is 10, but array elements are indexed from 0 to 9. Given the declaration
char foo[10];
you get the following layout in memory:
+---+
foo: | | foo[0]
+---+
| | foo[1]
+---+
| | foo[2]
+---+
| | foo[3]
+---+
| | foo[4]
+---+
| | foo[5]
+---+
| | foo[6]
+---+
| | foo[7]
+---+
| | foo[8]
+---+
| | foo[9]
+---+
For any N-element array, individual elements are indexed from 0 through N-1.
Remember that in C, the array subscript operation a[i]
is defined as *(a + i)
- given a starting address a
2 , offset i
elements (not bytes!!!) from that address and dereference the result. The first element is stored at a
, the second element is stored at a + 1
, the third at a + 2
, etc.
wchar_t
was introduced to represent character sets outside the ranges defined by ASCII or EBCDIC, so it's wider than the `char` type (often the width of two `char`s). With the advent of schemes like UTF-8 to represent non-English character sets, it's not that useful and I don't see it used very often.char test[4]
is the definition of your array, it means your array's size is four, but you can only use the spaces: test[0], test[1], test[2], test[3]
.
You can go to http://www.cplusplus.com/doc/tutorial/arrays/ to learn more about Arrays of c++.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.