简体   繁体   中英

What is stored in memory when you declare a pointer to a char array?

Let's say that

char arr[] = "test";

I read that arrays acts like a pointer to the string. Therefore, when I do:

cout << arr << endl;

I get test . When I do

char *ptr = arr

the variable ptr should now store the address of the pointer of arr . However, if I do

cout << ptr << endl

I get test . If it is basically a pointer to a pointer, why isn't it this to get "test":

cout << *ptr << endl; 

Can someone explain it to me in terms of how the memory is allocated?

What happens is that the "<<" operator sees char * as a C-string. Therefore it implies arr[0-end] and ptr[0-end]; When you do:

char *ptr = arr

you simply make a new char * to the same target. Therefore it is also treated as a C-string and implicitly "cout <<" prints the characters it points to.

This applies only to char * ; an input of:

int a[] = {1, 2}; cout << a << endl;

simply prints the address of the a array (address of first element).

char arr[] = "test";

array is declared and initialised to "test", and let base address is 1000

char *ptr = arr

a character type pointer is declared and initialised to same base address 1000

cout << ptr << endl

prints whole string "test" because 1000 is passed but *ptr gives value at 1000, and ptr is char type so...

cout << *ptr << endl; 

first char 't' is printed which is present on address 1000.

Tl;dr

In C++, the name of an array converts automatically to a pointer to its first element.

Full Answer

What gets stored in memory can vary from compiler to compiler, but let's get one compiler to tell us, gcc 6.3.0 for x86_64. The -S flag tells gcc to compile to human-readable, low-level assembly code. The -O flag tells it to optimize. We can use g++ -Wall -Wextra -Wpedantic -Wconversion -std=c++14 -O -S to compile the following file:

char arr[] = "test";
char* ptr = arr;
char* ptr2 = &arr[0];
constexpr unsigned int arr_size = sizeof(arr)/sizeof(arr[0]); // 5
char (*ptr3)[arr_size] = &arr; // A pointer to an array of arr_size chars.
char* const optimized_out = arr;

I'll edit the output a bit to make it easier to understand. A slightly-rearranged version of the file we get from this command (which ends with .s ) is as follows:

        .data


        .globl  arr
arr:
        .ascii "test\0"


        .globl  ptr
        .align 8
ptr:
        .quad   arr


        .globl  ptr2
        .align 8
ptr2:
        .quad   arr


        .globl  ptr3
        .align 8
ptr3:
        .quad   arr

So, what does this say? The .data declaration means that we are declaring the contents of the data segment of the compiled code. This is for variables whose contents we can modify.

The .globl declaration means that arr is a symbol that can be linked with other source files. The unindented lines arr: , ptr: and so on are labels representing the current address. So, when we link to arr: later, we are linking to the address, within the .data segment, of whatever bytes we tell the assembler to put there. Those are the five ASCII characters t , e , s , t and a terminating NUL.

Similarly, ptr is a global variable that is an address within the .data segment. There is a new directive here, .align 8 . This means to put the pointer on an address divisible by 8. (If gcc had actually laid the file out this way, it would need to waste three extra bytes of padding between the five bytes in the array and the aligned pointer; in fact, it put arr last so it would not need to.) On x86_64, aligned memory reads are faster than unaligned reads.

Then, a .quad , in x86_64 assembly, is a 64-bit variable, the size of a pointer. (64 bits is four times 16 bits, and the distant ancestor of the modern 64-bit desktop CPU, the 8086, was a machine with 16-bit words. So quad stands for quadword.)

What is stored in this 64-bit memory location? The value arr: , which is the address of the five-byte .ascii array.

You will notice that both ptr2 and ptr3 have identical definitions in the assembly. The standard guarantees that the name of an array decays, or implicitly converts to, a pointer to the first element of the array. And the address of an array is the same as the address of its first element; there cannot be any padding before any array element.

You cannot, in C++, assign the address of a char[] to a char* without a reinterpret_cast : char *this_does_not_work = &arr; does not work. This is only because they have different types, though. The type of array is char[5] , and the syntax to declare ptr3 as a pointer to an array of five char objects is char (*ptr3)[5] . In this case, for “simplicity,” I defined a symbolic constant for the size of arr , in case the string we pass to arr changes. The size of an array divided by the size of an element is equal to the number of elements in the array. (The standard guarantees that this is always true.)

The addresses &arr , arr and &arr[0] are all guaranteed by the standard to be the same; the only difference between them is their type. You will notice that the assembly file does not actually contain any type information; this allows you to declare something like extern char* const ptr3; in another file and have it work. GCC will store that information in the symbol table, for debugging purposes, if you also give it the -g flag.

You will notice that there are two variables in the source file that have no corresponding assembly-language definitions, the constexpr variable arr_size and the const variable optimized_out . In fact, gcc will include both of these if you tell it not to optimize. With the -O flag, it won't bother to allocate memory for small constants known at compile-time; it just substitutes 5 for arr_size or arr for optimized_out . It would, however, need to store a copy of these variables somewhere in memory if you ever took their address, such as &optimized_out .

Some of this is slightly different in C than in C++.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM