从32位到64位操作系统的无符号整数

Question

This code snippet is excerpted from a linux book. 该代码段摘自一本Linux书。 If this is not appropriate to post the code snippet here, please let me know. 如果不适合在此处发布代码段，请告诉我。 I will delete it. 我将其删除。 Thanks. 谢谢。

#include <stdio.h>
#include <stdlib.h>

int main(void)
{
  char buf[30];
  char *p;
  int i;
  unsigned int index = 0;
  //unsigned long index = 0;
  printf("index-1 = %lx (sizeof %d)\n", index-1, sizeof(index-1));
  for(i = 'A'; i <= 'Z'; i++)
      buf[i - 'A'] = i;
  p  = &buf[1];
  printf("%c: buf=%p p=%p p[-1]=%p\n", p[index-1], buf, p, &p[index-1]);
  return 0;
}

On 32-bit OS environment: This program works fine no matter the data type of index is unsigned int or unsigned long. 在32位OS环境上：无论索引的数据类型是unsigned int还是unsigned long，此程序都可以正常工作。

On 64-bit OS environment: The same program will run into "core dump" if index is declared as unsigned int. 在64位OS环境上：如果将index声明为unsigned int，则同一程序将运行到“核心转储”中。 However, if I only change the data type of index from unsigned int to a) unsigned long or b) unsigned short, this program works fine too. 但是，如果仅将索引的数据类型从unsigned int更改为a）unsigned long或b）unsigned short，则该程序也可以正常工作。

The reason from the book only tells me that 64-bit will cause the core-dump due to non-negative number. 书中的原因仅告诉我，由于非负数，64位将导致核心转储。 But I have no idea exactly about the reason why unsigned long and unsigned short work but unsigned int. 但是我完全不知道为什么unsigned long和unsigned short工作而unsigned int的原因。

What I am confused is that 我感到困惑的是

p + (0u -1) == p + UINT_MAX when index is unsigned int. p + (0u -1) == p + UINT_MAX当索引为无符号整数时。

BUT, 但，

p + (0ul - 1) == p[-1] when index is unsigned long. p + (0ul - 1) == p[-1]当索引为无符号长p + (0ul - 1) == p[-1]时。

I get stuck at here. 我被困在这里。

If anyone can help to elaborate the details, it is highly appreciated! 如果有人可以帮助您详细说明，我们将不胜感激！

Thank you. 谢谢。

Here comes some result on my 32 bit(RHEL5.10/gcc version 4.1.2 20080704) 这是我的32位上的一些结果（RHEL5.10 / gcc版本4.1.2 20080704）

and 64 bit machine (RHEL6.3/gcc version 4.4.6 20120305) 和64位计算机（RHEL6.3 / gcc版本4.4.6 20120305）

I am not sure if gcc version makes any difference here. 我不确定gcc版本在这里是否有任何区别。 So, I paste the information as well. 因此，我也粘贴了信息。

On 32 bit: 在32位上：

I tried two changes: 我尝试了两个更改：

1) Modify unsigned int index = 0 to unsigned short index = 0 . 1）将unsigned int index = 0修改为unsigned short index = 0 。

2) Modify unsigned int index = 0 to unsigned char index = 0 . 2）将unsigned int index = 0修改为unsigned char index = 0 。

The program can run without problem. 该程序可以正常运行。

index-1 = ffffffff (sizeof 4)

A: buf=0xbfbdd5da p=0xbfbdd5db p[-1]=0xbfbdd5da

It seems that the data type of index will be promoted to 4 bytes due to -1. 似乎由于-1，索引的数据类型将提升为4个字节。

On 64 bit: 在64位上：

I tried three changes: 我尝试了三个更改：

1) Modify unsigned int index = 0 to unsigned char index = 0 . 1）将unsigned int index = 0修改为unsigned char index = 0 。

  It works!

index-1 = ffffffff (sizeof 4)

A: buf=0x7fffef304ae0 p=0x7fffef304ae1 p[-1]=0x7fffef304ae0

2) Modify unsigned int index = 0 to unsigned short index = 0 . 2）将unsigned int index = 0修改为unsigned short index = 0 。

 It works!

index-1 = ffffffff (sizeof 4)

A: buf=0x7fff48233170 p=0x7fff48233171 p[-1]=0x7fff48233170

3) Modify unsigned int index = 0 to unsigned long index = 0 . 3）将unsigned int index = 0修改为unsigned long index = 0 。

 It works!

index-1 = ffffffff (sizeof 8)

A: buf=0x7fffb81d6c20 p=0x7fffb81d6c21 p[-1]=0x7fffb81d6c20

BUT, only 但只有

unsigned int index = 0 runs into the core dump at the last printf. unsigned int index = 0在最后一个printf处运行到核心转储中。

index-1 = ffffffff (sizeof 4)

Segmentation fault (core dumped)

Answer 1

Do not lie to the compiler! 不要骗编译器！

Passing printf an int where it expects a long ( %ld ) is undefined behavior. 将printf传递给需要long int （ %ld ）的int是未定义的行为。
(Creating a pointer pointing outside any valid object (and not just behind one) is UB too...) （创建指向任何有效对象外部（而不是仅在一个对象之后）的指针也是UB ...）

Correct the format specifiers and the pointer arithmetic (that includes indexing as a special case) and everything will work. 更正格式说明符和指针算法（在特殊情况下包括索引编制），一切正常。

_{UB includes "It works as expected" as well as "Catastrophic failure".} _{UB包括“按预期方式运行”以及“灾难性故障”。}

BTW: If you politely ask your compiler for all warnings, it would warn you. 顺便说一句：如果您礼貌地向编译器询问所有警告，它将警告您。 Use -Wall -Wextra -pedantic or similar. 使用-Wall -Wextra -pedantic或类似方法。

Answer 2

One other problem is code has is in your printf() : 另一个问题是代码在您的printf() ：

  printf("index-1 = %lx (sizeof %d)\n", index-1, sizeof(index-1));

Lets simplify: 让我们简化一下：

int i = 100;
print("%lx", i-1);

You are telling printf here is a long but in reality you are sending an int . 您告诉printf的long但实际上您正在发送int 。 clang does tell you the corrent warning (I think gcc should also spit the correct waring). clang确实会告诉您当前的警告（我认为gcc也应该吐出正确的警告）。 See: 看到：

test1.c:6:19: warning: format specifies type 'unsigned long' but the argument has type 'int' [-Wformat]
printf("%lx", i - 100);
        ~~~   ^~~~~~~
        %x   
1 warning generated.

Solution is simple: you need to pass a long to printf or tell printf to print an int : 解决方案很简单：您需要将long传递给printf或告诉printf打印int ：

printf("%lx", (long)(i-100) );
printf("%x", i-100);

You got luck on 32bit and your app did not crash. 您在32bit上运气不错，您的应用程序没有崩溃。 Porting it to 64bit revealed a bug in your code and you can now fix it. 将其移植到64位后，您的代码中发现了一个错误，现在您可以对其进行修复。

Answer 3

Arithmetic on unsigned values is always defined, in terms of wrap-around. 对于无符号值，总是按照环绕方式定义算术。 Eg (unsigned)-1 is the same as UINT_MAX . 例如(unsigned)-1与UINT_MAX相同。 So an expression like 所以像

p + (0u-1)

is equivalent to 相当于

p + UINT_MAX

( &p[0u-1] is equivalent to &*(p + (0u-1)) and p + (0u-1) ). （ &p[0u-1]等同于&*(p + (0u-1))和p + (0u-1) ）。

Maybe this is easier to understand if we replace the pointers with unsigned integer types. 如果我们将指针替换为无符号整数类型，也许这更容易理解。 Consider: 考虑：

uint32_t p32; // say, this is a 32-bit "pointer"
uint64_t p64; // a 64-bit "pointer"

Assuming 16, 32, and 64 bit for short , int , and long , respectively (entries on the same line equal): 假设short ， int和long分别为16、32和64位（在同一行上的条目相等）：

p32 + (unsigned short)-1    p32 + USHRT_MAX     p32 + (UINT_MAX>>16)
p32 + (0u-1)                p32 + UINT_MAX      p32 - 1
p32 + (0ul-1)               p32 + ULONG_MAX     p32 + UINT_MAX          p32 - 1

p64 + (0u-1)                p64 + UINT_MAX
p64 + (0ul-1)               p64 + ULONG_MAX     p64 - 1

You can always replace operands of addition, subtraction and multiplication on unsigned types by something congruent modulo the maximum value + 1. For example, 您始终可以用最大值+ 1取模的等价数来替换无符号类型的加，减和乘操作数。例如，

-1 ☰ ffffffff _hex mod 2 ³² -1☰ffffffff _十六进制 mod 2 ³²

(ffffffff _hex is 2 ³² -1 or UINT_MAX ), and also （ _十六进制 ffffffff为2 ³² -1或UINT_MAX ），以及

ffffffffffffffff _hex ☰ ffffffff _hex mod 2 ³² ffffffffffffffff _十六进制 ☰ffffffff _十六进制 mod 2 ³²

(for a 32-bit unsigned type you can always truncate to the least-significant 8 hex-digits). （对于32位无符号类型，您始终可以将其截断为最低有效的8个十六进制数字）。

Your examples: 您的示例：

32-bit 32位

unsigned short index = 0;

In index - 1 , index is promoted to int . 在index - 1 ，索引升为int 。 The result has type int and value -1 (which is negative). 结果具有int类型和值-1（为负）。 Same for unsigned char . 与unsigned char相同。

64-bit 64位

unsigned char index = 0;
unsigned short index = 0;

Same as for 32-bit. 与32位相同。 index is promoted to int , index - 1 is negative. index提升为int ， index - 1为负。

unsigned long index = 0;

The output 输出

index-1 = ffffffff (sizeof 8)

is weird, it's your only correct use of %lx but looks like you've printed it with %x (expecting 4 bytes); 很奇怪，这是%lx唯一正确用法，但看起来已经用%x打印（预期为4个字节）； on my 64-bit computer (with 64-bit long ) and with %lx I get: 在我的64位计算机（具有64位long ）和%lx我得到：

index-1 = ffffffffffffffff (sizeof 8)

ffffffffffffffff _hex is -1 modulo 2 ⁶⁴ . ffffffffffffffff _十六进制是-1模2 ⁶⁴ 。

unsigned index = 0;

An int cannot hold any value unsigned int can, so in index - 1 nothing is promoted to int , the result has type unsigned int and value -1 (which is positive , being the same as UINT_MAX or ffffffff _hex , since the type is unsigned). 一个int不能持有unsigned int可以拥有的任何值，因此在index - 1什么都没有提升为int ，结果的类型为unsigned int且值为-1（这是肯定的 ，与UINT_MAX或ffffffff _hex相同，因为该类型为unsigned ）。 For 32-bit-addresses, adding this value is the same as subtracting one: 对于32位地址，添加此值与减去1相同：

    bfbdd5db            bfbdd5db
+   ffffffff          -        1
=  1bfbdd5da
=   bfbdd5da          = bfbdd5da

(Note the wrap-around/truncation.) For 64-bit addresses, however: （请注意环绕/截断。）但是，对于64位地址：

    00007fff b81d6c21
+            ffffffff
=   00008000 b81d6c20

with no wrap-around. 没有回绕。 This is trying to access an invalid address, so you get a segfault. 这试图访问无效的地址，因此您会遇到段错误。

Maybe have a look at 2's complement on Wikipedia . 也许可以在Wikipedia上查看2的补码。

Under my 64-bit Linux, using a specifier expecting a 32-bit value while passing a 64-bit type (and the other way round) seems to “work”, only the 32 least-significant bits are read. 在我的64位Linux上，使用指定符来期望32位值同时传递64位类型（反之亦然）似乎“可行”，只能读取32个最低有效位。 But use the correct ones. 但是请使用正确的。 lx expects an unsigned long , unmodified x an unsigned int , hx an unsigned short (an unsigned short is promoted to int when passed to printf (it's passed as a variable argument), due to default argument promotions ). lx期望一个unsigned long ，一个未修改的x一个unsigned int ， hx一个unsigned short （由于默认的参数提升 ，当传递给printf时， unsigned short被提升为int （作为变量参数传递））。 The length modifier for size_t is z , as in %zu : size_t的长度修饰符为z ，如%zu ：

printf("index-1 = %lx (sizeof %zu)\n", (unsigned long)(index-1), sizeof(index-1));

(The conversion to unsigned long doesn't change the value of an unsigned int , unsigned short , or unsigned char expression.) （到unsigned long的转换不会更改unsigned int ， unsigned short或unsigned char表达式的值。）

sizeof(index-1) could also have been written as sizeof(+index) , the only effect on the size of the expression are the usual arithmetic conversions, which are also triggered by unary + . sizeof(index-1)也可以写为sizeof(+index) ，对表达式大小的唯一影响是通常的算术转换，也由一元+触发。

从32位到64位操作系统的无符号整数

问题描述

Thank you. 谢谢。

3 个解决方案

解决方案1
1 2014-08-10 13:18:38

解决方案2
1 2014-08-10 13:22:10

解决方案3
-1 已采纳 2014-08-10 13:21:13

从32位到64位操作系统的无符号整数

问题描述

Thank you. 谢谢。

3 个解决方案

解决方案1 1 2014-08-10 13:18:38

解决方案2 1 2014-08-10 13:22:10

解决方案3 -1 已采纳 2014-08-10 13:21:13

解决方案1
1 2014-08-10 13:18:38

解决方案2
1 2014-08-10 13:22:10

解决方案3
-1 已采纳 2014-08-10 13:21:13