简体   繁体   English

将指针强制转换为char指针会导致C中的数据丢失?

[英]Casting int pointer to char pointer causes loss of data in C?

I have the following piece of code: 我有以下代码:

#include <stdio.h>
#include <stdlib.h>
int main(int argc, char *argv[])
{
  int n = 260; 
  int *p = &n;
  char *pp = (char*)p;
  *pp = 0;

  printf("n = %d\n", n);
  system("PAUSE");  
  return 0;
}

The output put of the program is n = 256 . 程序的输出put是n = 256 I may understand why it is, but I am not really sure. 我可能理解为什么,但我不确定。 Can anyone give me a clear explanation, please? 请问有人能给我一个明确的解释吗?

Thanks a lot. 非常感谢。

The int 260 (= 256 * 1 + 4) will look like this in memory - note that this depends on the endianness of the machine - also, this is for a 32-bit (4 byte) int : int 260(= 256 * 1 + 4)在内存中看起来像这样 - 请注意这取决于机器的字节顺序 - 这也适用于32位(4字节) int

0x04 0x01 0x00 0x00

By using a char pointer, you point to the first byte and change it to 0x00 , which changes the int to 256 (= 256 * 1 + 0). 通过使用char指针,指向第一个字节并将其更改为0x00 ,这会将int更改为256(= 256 * 1 + 0)。

You're apparently working on a little-endian machine. 你显然是在使用小端机器。 What's happening is that you're starting with an int that takes up at least two bytes. 发生的事情是你从一个占用至少两个字节的int开始。 The value 260 is 256+4. 值260是256 + 4。 The 256 goes in the second byte, and the 4 in the first byte. 256进入第二个字节,第4个进入第一个字节。 When you write 0 to the first byte, you're left with only the 256 in the second byte. 当您将0写入第一个字节时,您只剩下第二个字节中的256个。

I understood what exactly happens by changing value: 我明白改变价值究竟发生了什么:

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char *argv[])
{
  int n = 260; 
  int *p = &n;
  char *pp = (char*)p;
  *pp = 20;

    printf("pp = %d\n", (int)*pp);
  printf("n = %d\n", (int)n);
  system("PAUSE");  
  return 0;
}

The output value are 20 and 276 输出值为20和276

So basically the problem is not that you have data loss, is that the char pointer points only to the first byte of the int and so it changes only that, the other bytes are not changed and that's why those weird value (if you are on an INTEL processor the first byte is the least significant, that's why you change the "smallest" part of the number 所以基本上问题不在于你有数据丢失,是char指针只指向int的第一个字节,所以它只改变了,其他字节没有改变,这就是为什么那些奇怪的值(如果你在INTEL处理器的第一个字节是最不重要的,这就是你改变数字“最小”部分的原因

In C a pointer references a block of bytes based on the type associated with the pointer. 在C中,指针根据与指针关联的类型引用字节块。 So in your case the integer pointer refers to a block 4 bytes in size, while a char is only one byte long. 因此,在您的情况下,整数指针指的是大小为4个字节的块,而char只有一个字节长。 When you set the char to 0 it only changes the first byte of the integer value, but because of how numbers are stored in memory on modern machines (effectively in reverse order from how you would write it) you are overwritting the least significant byte (which was 4) you are left w/ 256 as the value 当您将char设置为0时,它只会更改整数值的第一个字节,但由于数字在现代机器的内存中的存储方式(实际上与编写它的方式相反),您将覆盖最低有效字节(这是4)你剩下w / 256作为值

Considering 32 bit systems, 256 will be represented in like this. 考虑到32位系统, 256将像这样表示。

00000000 (Byte-3)   00000000 (Byte-2)    00000001(Byte-1)     00000100(Byte-0)

Now when p is typecast-ed to a char pointer, the label on the pointer changes, but the memory contents don't. 现在当p被类型转换为char指针时,指针上的标签会改变,但内存内容不会改变。 It means earlier p could have access 4 bytes, as it was an integer pointer, but now it can only access 1 byte as it is a char pointer. 这意味着早期的p可以访问4个字节,因为它是一个整数指针,但现在它只能访问1个字节,因为它是一个char指针。 So, only the LSB gets changes to zero, not all the 4 bytes. 因此,只有LSB变为零,而不是所有4个字节。

And it becomes 它变成了

00000000 (Byte-3)   00000000 (Byte-2)    00000001(Byte-1)     00000000(Byte-0)

Hence, the o/p is 256 . 因此,o / p是256

Your problem is the assignment *pp = 0; 你的问题是作业* pp = 0; You're dereferencing pp which points to n, and changing n. 您正在取消引用指向n的pp,并更改n。 However, pp is a char pointer so it doesn't change all of n which is an int. 但是,pp是一个char指针,所以它不会改变所有n,这是一个int。 This causes the binary complications in the other answers. 这导致其他答案中的二元并发症。

In terms of the C language, the description for what you are doing is modifying the representation of the int variable n . 就C语言而言,您正在做的描述是修改 int变量n 的表示 In C, all types have a "representation" as one or more bytes ( unsigned char ), and it's legal to access the underlying representation by casting a pointer to char * or unsigned char * - the latter is better for reasons that would just unnecessarily complicate things if I went into them here. 在C中,所有类型都有一个“表示”作为一个或多个字节( unsigned char ),通过将指针转换为char *unsigned char *来访问底层表示是合法的 - 后者更好的原因只是不必要的如果我在这里进入它们会让事情变得复杂

As schnaader answered, on a little endian, twos complement implementation with 32-bit int , the representation of 260 is: 正如schnaader回答的那样,在一个小端,两个补充实现32位int ,260的表示是:

0x04 0x01 0x00 0x00

and overwriting the first byte with 0 yields: 并以0覆盖第一个字节:

0x00 0x01 0x00 0x00

which is the representation for 256 on such an implementation. 这是一个实现的256的表示。

C allows implementations which have padding bits and trap representations (which raise a signal/abort your program if they're accessed), so in general overwriting part but not all of an int in this way is not safe to do. C允许实现具有填充位和陷阱表示(如果访问它们会引发信号/中止您的程序),因此通常以这种方式覆盖部分但不是所有的int是不安全的。 Nonetheless, it does work on most real-world machines, and if you instead used the type uint32_t , it would be guaranteed to work (although the ordering of the bits would still be implementation-dependent). 尽管如此,它确实可以在大多数真实世界的机器上运行,如果您使用类型uint32_t ,它将保证可以工作(尽管这些位的排序仍然依赖于实现)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM