简体   繁体   中英

Convert byte array to unsigned int using pointers

char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
unsigned int j = *f;
printf("%u\n", j);

so if the memory looks like this: 0000 0000 0000 0000 0000 0000 0000 0001

The program outputs 0. How do I make it output a uint value of the entire 32 bits?

Because you are using type promotion. char will promote to int when accessed. You'll get no diagnostic for this. So what you are doing is dereferencing the first element in your char array, which is 0, and assigning it to an int ...which likewise ends up being 0.

What you want to do is technically undefined behavior but generally works. You want to do this:

unsigned int j = *reinterpret_cast<unsigned int*>(f);

At this point you'll be dealing with undefined behavior and with the endianness of the platform. You probably do not have the value you want recorded in your byte stream. You're treading in territory that requires intimate knowledge of your compiler and your target architecture.

Supposed your platform supports 32bit length integers, you can do the following to achieve the kind of cast you want:

char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;

uint32_t j;
memcpy(&j,f,sizeof(j));
printf("%u\n", j);

Be aware of endianess in integer representation.

In order to ensure that your code works on both little endian and big endian systems, you could do the following:

char f[4] = {0,0,0,1};
int32_t j = *((int32_t *)f);
j=ntohl(j);
printf("%d", j);

This will print 1 on both little endian and big endian systems. Without using ntohl, 1 will only be printed on Big Endian systems.

The code works because f is being assigned values in the same way as in a Big Endian System. Since network order is also Big Endian, ntohl will correctly convert j . If the host is Big Endian, j will remain unchanged. If the host is Little Endian, the bytes in j will be reversed.

What happens in the line:

unsigned int j = *f; 

is simply assigning the first element of f to the integer j. It is equivalent to:

unsigned int j = f[0];

and since f[0] is 0 it is really just assigning a 0 to the integer:

unsigned int j = 0;

You will have to convert the elements of f.

Reinterpretation will always cause undefined behavior. The following example shows such usage and it is always incorrect :

unsigned int j = *( unsigned int* )f;

Undefined behavior may produce any result, even apparently correct ones. Even if such code appears to produce correct results when you run it for the first time, this isn't proof that the program is defined. The program is still undefined, and may produce incorrect results at any time.

There is no such thing as technically undefined behavior or generally works, the program is either undefined or not. Relying on such statements is dangerous and irresponsible.

Luckily we don't have to rely on such bad code.

All you need to do is choose the representation of the integer that will be stored in f, and then convert it. It appears you want to store in big-endian, with at most 8 bits per element. This doesn't mean that the machine must be big-endian, only the representation of the integer you're encoding in f. Representation of integers on the machine is not important, as this method is completely portable.

This means the most significant byte will appear first. The most significant byte is f[0], and the least significant byte is f[3].

We will need an integer capable of storing at least 32 bits and type unsigned long does this.

Type char is for used storing characters not integers. An unsigned integer type like unsigned char should be used.

Then only the conversion from big-endian encoded in f must be done:

unsigned char encoded[4] = { 0 , 0 , 0 , 1 };
unsigned long value = 0;
value = value | ( ( ( unsigned long )encoded[0] & 0xFF ) << 24 );
value = value | ( ( ( unsigned long )encoded[1] & 0xFF ) << 16 );
value = value | ( ( ( unsigned long )encoded[2] & 0xFF ) << 8 );
value = value | ( ( ( unsigned long )encoded[3] & 0xFF ) << 0 );

regarding the posted code:

char* f = (char*)malloc(4 * sizeof(char));
f[0] = 0;
f[1] = 0;
f[2] = 0;
f[3] = 1;
unsigned int j = *f;
printf("%u\n", j);
  1. in C, the return type from malloc() is void* which can be assigned to any other pointer, so casting just clutters the code and can be a problem when applying maintenance to the code.
  2. The C standard defines sizeof(char) as 1, so that expression has absolutely no effect as a part of the expression passed to malloc()
  3. the size of a int is not necessarily 4 (think of microprocessors or 64bit architecture)
  4. the function: calloc() will pre set all the bytes to 0x00
  5. which byte should be set to 0x01 depends on the Endianness of the underlying architecture

lets' assume, for now, your computer is a little Endian architecture. (IE Intel or similar)

then the code should look similar to the following:

#include <stdio.h>  // printf(), perror()
#include <stdlib.h> // calloc(), exit(), EXIT_FAILURE

int main( void )
{
    char *f = calloc( 1, sizeof(unsigned int) );
    if( !f )
    {
        perror( "calloc failed" );
        exit( EXIT_FAILURE );
    }

    // implied else, calloc successful

    // f[sizeof(unsigned int)-1] = 0x01; // if big Endian
    f[0] = 0x01;   // assume little Endian/Intel x86 or similar
    unsigned int j = *(unsigned int*)f;
    printf("%u\n", j);
}

Which when compiled/linked, outputs the following:

1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM