简体   繁体   中英

c casting a pointer to a char array as an int* gives the same result as casting a char* as an int*

Consider the following C program:

#include <stdio.h>
#include <stdlib.h>
int main(){
    char c[1] = {'Q'};
    printf("%c ",*(char*)(c));   // line 1
    printf("%c\n",*(char*)(&c));  // line 2
}

the output is QQ

Here is my understanding of what should happen, c is a pointer to a char, so the char printed by line 1 should be the letter Q because the pointer to a char is being cast as a pointer to a char (so nothing happens) then it is dereferenced. Because the char that c points to is 'Q', line 1 prints 'Q'. This seems to make sense to me.

However line 2 does not. The address of c is cast as a pointer to a char so what I believe should happens is after being dereferenced the expression *(char*)(&c) should simplify to the value of the pointer c but expressed as a char.

These both give the same result and I don't think it is a coincidence because I've tried it on many different letters. I'd like to know why this is. Thanks

PS:
I tried this:

#include <stdio.h>
#include <stdlib.h>
int main(){
    char c[10] = "asdf";
    printf("%c ",*(char*)(c));   // line 1
    printf("%c\n",*(char*)(&c));  // line 2
}

and I got this: aa

In the statement

char c[1] = {'Q'};

c is char array & array name ie c itself represents base address of that array. If you prints c and &c both gives the same results.

Side Note :- c means its a pointer to the first elements of array and &c means its a pointer to the whole array.

int main(void){
        char c[1] = {'Q'};
        printf("%p %p\n",(void*)c,(void*)&c); /* bot results the same */
        return 0;
}

Thats why both *(char*)(c) and *(char*)(&c) yields in same results. for eg

char c[10] = "asdf"; /* lets assume base address of c is 0x100 */

It looks like

 --------------------------------------
 |  a   |   s   |   d   |  f   | \0   |
 --------------------------------------
0x100   0x101   0x102 ..
c

Next how these two expression *(char*)(c) & *(char*)(&c) executed.

*(char*)(c)  =>  *(char*)(0x100)  => typecasted as char* means c points to 1 byte memory 
             =>   *(0x100)        => value in the first byte from 0x100 to 0x101 => a

and

*(char*)(&c)     =>  *(char*)(&(0x100)) => *(char*) (0x100) => c and &c are same 
                 =>   *(0x100)          => value in the first byte from 0x100 to 0x101 => a

If you think about the equality (void*) c == (void*) &c , you would realize that it is logically correct. Take for instance a static allocated integer:

int myvar = 2;

Printing the value of myvar makes sense, since it's an integer and it's "universally" recognizable.

Now how can you identify an array? Obviously with an address to that memory storage.

But how can you print it? There is not a rule for that. What's the point on distinguishing between c and &c if there is not a reason to do this. One could speculate that c can be interpreted as the first element of an array, but you can imagine all the drawbacks of this choice.

The reasoning of your compiler is actually different, especially because variables do not exist for it, so it replaces them with something it can work with.

Take for example this snippet:

int a = 2;
char c[6] = {'a', 'b', 'c', 'd', 'e', '\0'};

printf("%p\n",  c);
printf("%p\n", (void*) &c);
printf("%d\n", a);
printf("%p\n", (void*) &a);

Here is the generated assembly that gcc produces (intel syntax):

    mov     DWORD PTR [rbp-4], 2     # int a = 2;
    mov     BYTE PTR [rbp-16], 97    # char c[6] = {'a', 'b', 'c', 'd', 'e', '\0'};
    mov     BYTE PTR [rbp-15], 98
    mov     BYTE PTR [rbp-14], 99
    mov     BYTE PTR [rbp-13], 100
    mov     BYTE PTR [rbp-12], 101
    mov     BYTE PTR [rbp-11], 0
    lea     rax, [rbp-16]            # printf("%p\n",  c);
    mov     rsi, rax
    mov     edi, OFFSET FLAT:.LC0
    mov     eax, 0
    call    printf
    lea     rax, [rbp-16]            # printf("%p\n", (void*) &c);
    mov     rsi, rax
    mov     edi, OFFSET FLAT:.LC0
    mov     eax, 0
    call    printf
    mov     eax, DWORD PTR [rbp-4]   # printf("%d\n", a);
    mov     esi, eax
    mov     edi, OFFSET FLAT:.LC1 [complete object constructor] [complete object constructor]
    mov     eax, 0
    call    printf
    lea     rax, [rbp-4]             # printf("%p\n", (void*) &a);
    mov     rsi, rax
    mov     edi, OFFSET FLAT:.LC0
    mov     eax, 0
    call    printf

The compiler interprets both c and &c in the same way.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM