简体   繁体   中英

Unsigned int from 32 bit to 64bit OS

This code snippet is excerpted from a linux book. If this is not appropriate to post the code snippet here, please let me know. I will delete it. Thanks.

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
  char buf[30];
  char *p;
  int i;
  unsigned int index = 0;
  //unsigned long index = 0;
  printf("index-1 = %lx (sizeof %d)\n", index-1, sizeof(index-1));
  for(i = 'A'; i <= 'Z'; i++)
      buf[i - 'A'] = i;
  p  = &buf[1];
  printf("%c: buf=%p p=%p p[-1]=%p\n", p[index-1], buf, p, &p[index-1]);
  return 0;
}

On 32-bit OS environment: This program works fine no matter the data type of index is unsigned int or unsigned long.

On 64-bit OS environment: The same program will run into "core dump" if index is declared as unsigned int. However, if I only change the data type of index from unsigned int to a) unsigned long or b) unsigned short, this program works fine too.

The reason from the book only tells me that 64-bit will cause the core-dump due to non-negative number. But I have no idea exactly about the reason why unsigned long and unsigned short work but unsigned int.

What I am confused is that

p + (0u -1) == p + UINT_MAX when index is unsigned int.

BUT,

p + (0ul - 1) == p[-1] when index is unsigned long.

I get stuck at here.

If anyone can help to elaborate the details, it is highly appreciated!

Thank you.

Here comes some result on my 32 bit(RHEL5.10/gcc version 4.1.2 20080704)

and 64 bit machine (RHEL6.3/gcc version 4.4.6 20120305)

I am not sure if gcc version makes any difference here. So, I paste the information as well.

On 32 bit:

I tried two changes:

1) Modify unsigned int index = 0 to unsigned short index = 0 .

2) Modify unsigned int index = 0 to unsigned char index = 0 .

The program can run without problem.

index-1 = ffffffff (sizeof 4)

A: buf=0xbfbdd5da p=0xbfbdd5db p[-1]=0xbfbdd5da

It seems that the data type of index will be promoted to 4 bytes due to -1.

On 64 bit:

I tried three changes:

1) Modify unsigned int index = 0 to unsigned char index = 0 .

  It works!

index-1 = ffffffff (sizeof 4)

A: buf=0x7fffef304ae0 p=0x7fffef304ae1 p[-1]=0x7fffef304ae0

2) Modify unsigned int index = 0 to unsigned short index = 0 .

 It works!

index-1 = ffffffff (sizeof 4)

A: buf=0x7fff48233170 p=0x7fff48233171 p[-1]=0x7fff48233170

3) Modify unsigned int index = 0 to unsigned long index = 0 .

 It works!

index-1 = ffffffff (sizeof 8)

A: buf=0x7fffb81d6c20 p=0x7fffb81d6c21 p[-1]=0x7fffb81d6c20

BUT, only

unsigned int index = 0 runs into the core dump at the last printf.

index-1 = ffffffff (sizeof 4)

Segmentation fault (core dumped)

Do not lie to the compiler!

Passing printf an int where it expects a long ( %ld ) is undefined behavior.
(Creating a pointer pointing outside any valid object (and not just behind one) is UB too...)

Correct the format specifiers and the pointer arithmetic (that includes indexing as a special case) and everything will work.

UB includes "It works as expected" as well as "Catastrophic failure".

BTW: If you politely ask your compiler for all warnings, it would warn you. Use -Wall -Wextra -pedantic or similar.

One other problem is code has is in your printf() :

  printf("index-1 = %lx (sizeof %d)\n", index-1, sizeof(index-1));

Lets simplify:

int i = 100;
print("%lx", i-1);

You are telling printf here is a long but in reality you are sending an int . clang does tell you the corrent warning (I think gcc should also spit the correct waring). See:

test1.c:6:19: warning: format specifies type 'unsigned long' but the argument has type 'int' [-Wformat]
printf("%lx", i - 100);
        ~~~   ^~~~~~~
        %x   
1 warning generated.

Solution is simple: you need to pass a long to printf or tell printf to print an int :

printf("%lx", (long)(i-100) );
printf("%x", i-100);

You got luck on 32bit and your app did not crash. Porting it to 64bit revealed a bug in your code and you can now fix it.

Arithmetic on unsigned values is always defined, in terms of wrap-around. Eg (unsigned)-1 is the same as UINT_MAX . So an expression like

p + (0u-1)

is equivalent to

p + UINT_MAX

( &p[0u-1] is equivalent to &*(p + (0u-1)) and p + (0u-1) ).

Maybe this is easier to understand if we replace the pointers with unsigned integer types. Consider:

uint32_t p32; // say, this is a 32-bit "pointer"
uint64_t p64; // a 64-bit "pointer"

Assuming 16, 32, and 64 bit for short , int , and long , respectively (entries on the same line equal):

p32 + (unsigned short)-1    p32 + USHRT_MAX     p32 + (UINT_MAX>>16)
p32 + (0u-1)                p32 + UINT_MAX      p32 - 1
p32 + (0ul-1)               p32 + ULONG_MAX     p32 + UINT_MAX          p32 - 1

p64 + (0u-1)                p64 + UINT_MAX
p64 + (0ul-1)               p64 + ULONG_MAX     p64 - 1

You can always replace operands of addition, subtraction and multiplication on unsigned types by something congruent modulo the maximum value + 1. For example,

-1 ☰ ffffffff hex mod 2 32

(ffffffff hex is 2 32 -1 or UINT_MAX ), and also

ffffffffffffffff hex ☰ ffffffff hex mod 2 32

(for a 32-bit unsigned type you can always truncate to the least-significant 8 hex-digits).

Your examples:

32-bit

  • unsigned short index = 0;

In index - 1 , index is promoted to int . The result has type int and value -1 (which is negative). Same for unsigned char .

64-bit

  • unsigned char index = 0;
  • unsigned short index = 0;

Same as for 32-bit. index is promoted to int , index - 1 is negative.

  • unsigned long index = 0;

The output

index-1 = ffffffff (sizeof 8)

is weird, it's your only correct use of %lx but looks like you've printed it with %x (expecting 4 bytes); on my 64-bit computer (with 64-bit long ) and with %lx I get:

index-1 = ffffffffffffffff (sizeof 8)

ffffffffffffffff hex is -1 modulo 2 64 .

  • unsigned index = 0;

An int cannot hold any value unsigned int can, so in index - 1 nothing is promoted to int , the result has type unsigned int and value -1 (which is positive , being the same as UINT_MAX or ffffffff hex , since the type is unsigned). For 32-bit-addresses, adding this value is the same as subtracting one:

    bfbdd5db            bfbdd5db
+   ffffffff          -        1
=  1bfbdd5da
=   bfbdd5da          = bfbdd5da

(Note the wrap-around/truncation.) For 64-bit addresses, however:

    00007fff b81d6c21
+            ffffffff
=   00008000 b81d6c20

with no wrap-around. This is trying to access an invalid address, so you get a segfault.

Maybe have a look at 2's complement on Wikipedia .


Under my 64-bit Linux, using a specifier expecting a 32-bit value while passing a 64-bit type (and the other way round) seems to “work”, only the 32 least-significant bits are read. But use the correct ones. lx expects an unsigned long , unmodified x an unsigned int , hx an unsigned short (an unsigned short is promoted to int when passed to printf (it's passed as a variable argument), due to default argument promotions ). The length modifier for size_t is z , as in %zu :

printf("index-1 = %lx (sizeof %zu)\n", (unsigned long)(index-1), sizeof(index-1));

(The conversion to unsigned long doesn't change the value of an unsigned int , unsigned short , or unsigned char expression.)

sizeof(index-1) could also have been written as sizeof(+index) , the only effect on the size of the expression are the usual arithmetic conversions, which are also triggered by unary + .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM