简体   繁体   中英

Why does casting the pointer change the value at the address?

So I was doing an exercise to see if I was using memset correctly.

Here's the original code I wrote which was supposed to memset some addressese to have value 50:

int main(){
    int *block1 = malloc(2048);
    memset(block1, 50, 10);
    // int count = 0;
    for (int *iter = block1; (uint8_t *) iter < (uint8_t *)block1 + 10; iter = (int *) ((uint8_t *)iter + 1) ){
        printf("%p : %d\n", iter, *iter);
    }
    return 0;
}

I expected every address in memory to store the value 50. HOWEVER my output was:

(Address: Value)

0x14e008800 : 842150450
0x14e008801 : 842150450
0x14e008802 : 842150450
0x14e008803 : 842150450
0x14e008804 : 842150450
0x14e008805 : 842150450
0x14e008806 : 842150450
0x14e008807 : 3289650
0x14e008808 : 12850
0x14e008809 : 50

I was stuck on the problem for a while and tried a bunch of things until I randomly decided that maybe my pointer is the problem. I then tried a uint8_t pointer.

int main(){
    uint8_t *block1 = malloc(2048);
    memset(block1, 50, 10);
    for (uint8_t  *iter = block1; iter < block1 + 10; iter++ ){
        printf("%p : %d\n", iter, *iter);
    }
    return 0;
}

All I did was change the type of the block1 variable and my iter variable to be uint8_t pointers instead of int pointers and I got the correct result!

0x13d808800 : 50
0x13d808801 : 50
0x13d808802 : 50
0x13d808803 : 50
0x13d808804 : 50
0x13d808805 : 50
0x13d808806 : 50
0x13d808807 : 50
0x13d808808 : 50
0x13d808809 : 50

My question is then, why did that make such a difference?

My question is then, why did that make such a difference?

Because the exact type of a pointer is hugely important. Pointers in C are not just memory addresses. Pointers are memory addresses, along with a notion of what type of data is expected to be found at that address.

If you write

uint8_t *p;
... p = somewhere ...
printf("%d\n", *p);

then in that last line, *p fetches one byte of memory pointed to by p .

But if you write

int *p;
... p = somewhere ...
printf("%d\n", *p);

where, yes, the only change is the type of the pointer, then in that exact same last line, *p now fetches four bytes of memory pointed to by p , interpreting them as a 32-bit int . (This assumes int on your machine is four bytes, which is pretty common these days.)

When you called

memset(block1, 50, 10);

you were asking for some (though not all) of the individual bytes of memory in block1 to be set to 50.

When you used an int pointer to step over that block of memory, fetching (as we said earlier) four bytes of memory at a time, you got 4-byte integers where each of the 4 bytes contained the value 50. So the value you got was

(((((50 << 8) | 50) << 8) | 50) << 8) | 50

which just happens to be exactly 842150450.

Or, looking at it another way, if you take that value 842150450 and convert it to hex (base 16), you'll find that it's 0x32323232, where 0x32 is the hexadecimal value of 50, again showing that we have four bytes each with the value 50.

Now, that all makes sense so far, although, you were skating on thin ice in your first program. You had int *iter , but then you said

for(iter = block1; (uint8_t *) iter < (uint8_t *)block1 + 10; iter = (int *) ((uint8_t *)iter + 1) )

In that cumbersome increment expression

iter = (int *) ((uint8_t *)iter + 1)

you have contrived to increment the address in iter by just one byte. Normally, we say

iter = iter + 1

or just

iter++

and this means to increment the address in iter by several bytes, so that it points at the next int in a conventional array of int .

Doing it the way you did had three implications:

  1. You were accessing a sort of sliding window of int -sized subblocks of block1 . That is, you fetched an int made from bytes 1, 2, 3, and 4, then an int made from bytes 2, 3, 4, and 5, then an int made from bytes 3, 4, 5, and 6, etc. Since all the bytes had the same value, you always got the same value, but this is a strange and generally meaningless thing to do.
  2. Three out of four of the int values you fetched were unaligned . It looks like your processor let you get away with this, but some processors would have given you a Bus Error or some other kind of memory-access exception, because unaligned access aren't always allowed.
  3. You also violated the rule about strict aliasing .

The function memset sets each byte of the supplied memory with the specified value.

So in this call

memset(block1, 50, 10);

10 bytes of the memory addressed by the pointer block1 were set with the value 50 .

But using the pointer iter that has the type int * you are outputting at once sizeof( int ) bytes pointed to by the pointer.

On the other hand if to declare the pointer as having the type

uint8_t  *iter;

then you will output only one byte of memory.

Consider the following demonstration program.

#include <stdio.h>

int main( void ) 
{
    int x;
    memset( &x, 50, sizeof( x ) );

    printf( "x = %d\n", x );

    for ( const char *p = ( const char * )&x; p != ( const char * )&x + sizeof( x ); ++p )
    {
        printf( "%d", *p ); 
    }
    putchar( '\n' );
}

The program output is

x = 842150450
50505050

That is each byte of the memory occupied by the integer variable x was set equal to 50.

If to output each byte separately then the program outputs the values 50.

To make it even more clear consider one more demonstration program.

#include <stdio.h>

int main( void ) 
{
    printf( "50 in hex is %#x\n", 50 );
    int x = 0x32323232;
    printf( "x = %d\n", x );
}

The program output is

50 in hex is 0x32
x = 842150450

That is the value 50 in hexadecimal is equal tp 0x32 .

Thus this initialization

int x = 0x32323232;

yields the same result as the call of the function memset

memset( &x, 50, sizeof( x ) );

that you could equivalently rewrite like

memset( &x, 0x32, sizeof( x ) );

In the first case you are de-referencing the int* iter so it prints the (misaligned) int value at the address, not the byte value.

It is clear what is happening when you look at the value 842150450 in hexadecimal - 0x32323232 - that is each byte of the integer is 0x32 (50 decimal). The bytes after the tenth byte are undefined, but happen to be zero in this case and the target little endian, so it tails off with 0x323233, 0x3232, and finally 0x32.

Clearly the second case is the more "correct" solution, but you can fix the first case thus;

printf("%p : %d\n", 
       (void*)iter, 
       *(uint8_t*)iter);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM