[英]What happens in background or in memory when we cast char * to int *
I am learning type casting of pointers and randomly comes to this program 我正在学习指针的类型转换,并随机进入该程序
#include <stdio.h>
main() {
char* p="01234567890123456789";
int *pp = (int *)p;
printf("%d",pp[0]);
}
On executing above program , output is 858927408
What are these randome numbers and from where they come ? 执行上述程序时,输出为
858927408
这些随机数是什么,它们从何而来? What's happening in background or in memory ? 后台或内存中发生了什么?
Edit : And if i write printf("%c",pp[0]);
编辑:如果我写
printf("%c",pp[0]);
then output is 0
which is correct but when I change pp[0]
to pp[1]
then output is 4
but how ? 然后输出是
0
,这是正确的,但是当我将pp[0]
更改为pp[1]
输出是4
但是如何?
If you express the result in hexadecimal (%x), you can see that: 如果以十六进制(%x)表示结果,则可以看到:
858927408 = 0x33323130
0x33
is the ascii code for '3'
0x33
是'3'
的ASCII码 0x32
is the ascii code for '2'
0x32
是'2'
的ASCII码 0x31
is the ascii code for '1'
0x31
是'1'
的ASCII码 0x30
is the ascii code for '0'
0x30
是'0'
的ASCII码 So you just display the memory storing 0123456...
But since your processor is little endian , you see the codes inverted. 因此,您只需要显示存储
0123456...
的内存即可0123456...
但是由于您的处理器为低端字节序 ,因此您会看到代码反转。
In memory, you have (in hexa) 在内存中,您拥有(六进制)
30 31 32 33 34 35 36 37 38 # 0 1 2 3 4 5 6 7 8
39 30 31 32 33 34 35 36 37 # 9 0 1 2 3 4 5 6 7
38 39 00 # 8 9\0
In the printf("%d...")
, you read the 4 first bytes as a little endian integer, So it display the result of 0x33*0x1000000 + 0x32*0x10000 +0x31*0x100 +0x30
在
printf("%d...")
,您将第4个字节读取为一个小端字节整数,因此它将显示结果0x33*0x1000000 + 0x32*0x10000 +0x31*0x100 +0x30
With %c
, things are different: 使用
%c
,情况有所不同:
If you write printf("%c", pp[0])
, you will try to print ONE character from 0x33323130
, so 0x30
is retain (in your case, might be an UB in some cases, I'm not sure) so it display "0" which ascii code is 0x30
如果您编写
printf("%c", pp[0])
,则将尝试从0x33323130
打印一个字符,因此保留0x30
(在您的情况下,在某些情况下可能是UB,我不确定)它显示“ 0”,ASCII码为0x30
If you write printf("%c", pp[1])
, you will try to print ONE character from 0x37363534
, so 0x34
is retain so it display "4" which ascii code is 0x34
如果您写
printf("%c", pp[1])
,您将尝试从0x37363534
打印一个字符,因此保留0x34
,因此显示“ 4”,ASCII码为0x34
"01234567890123456789"
are 48, 49, 50, and 51 (hexadecimal 0x30, 0x31, 0x32, and 0x33), which are the ASCII codes for the characters “0”, “1”, “2”, and “3”. "01234567890123456789"
的前四个字节为"01234567890123456789"
和51(十六进制0x30、0x31、0x32和0x33),这是字符“ 0”的ASCII码。 ,“ 1”,“ 2”和“ 3”。 (int *)p
converts p
from char *
to int *
. (int *)p
将p
从char *
转换为int *
。 Pointer conversions are not fully defined by the C standard. p
points to. p
指向的相同位置。 pp
to (int *)p
, pp[0]
fetches the bytes at pp
and interprets them as an int
. pp
设置为(int *)p
, pp[0]
提取pp
处的字节并将其解释为int
。 In your implementation, int
objects have four bytes, and bytes are ordered with the least significant byte in the lowest-addressed memory. int
对象有四个字节,并且字节在最低寻址的内存中以最低有效字节排序。 So the bytes 0x30, 0x31, 0x32, and 0x33 are read from memory and formed into an integer 0x33323130 (decimal 858927408). Three things about pointer conversions are relevant here: 有关指针转换的三件事在这里相关:
int
objects should be four-byte aligned, whereas char
objects may have any alignment. int
对象应按四字节对齐,而char
对象可以具有任何对齐方式。 If the address in p
is not correctly aligned for an int
, then the expression (int *)p
could cause the program to crash or could cause undesired results. p
的地址未针对int
正确对齐,则表达式(int *)p
可能会导致程序崩溃或导致不良结果。 char *
to an int *
is except that converting the result back to char *
will yield the original pointer (or an equivalent pointer). char *
转换为int *
的结果是什么,除了将结果转换回char *
会产生原始指针(或等效指针)。 In many C implementations, this conversion will yield a pointer to the same address, just with a different type. pp[0]
accesses the bytes at p
as if they were an int
. pp[0]
访问int
一样访问p
处的字节。 This violates a rule in the C standard, called the aliasing rule, that says an object shall have its value accessed only by an expression using a correct type. int
is never a correct type for a char
(or for several char
). int
永远不是一个char
(或几个char
)的正确类型。 When this rule is violated, the C standard does not define the behavior. The last point is important because C implementations may or may not support aliasing. 最后一点很重要,因为C实现可能支持也可能不支持别名。 Some C implementations support aliasing (meaning they define the behavior even though the C standard does not) because it was widely used, and they want to support existing code that uses it, or because it is needed in certain types of software.
一些C实现支持别名(这意味着即使C标准没有定义,也可以定义行为),因为它已经被广泛使用,并且他们希望支持使用它的现有代码,或者因为某些类型的软件需要它。 Some C implementations do not support aliasing because this allows them to optimize programs better.
一些C实现不支持别名,因为这使它们可以更好地优化程序。 (If the compiler can assume that an
int *
never points to a float
, when it may be able to avoid reloading float
data after assignments through int
pointers, since those assignments could not have changed the float
data.) Some compilers have switches so you can enable or disabled aliasing support. (如果编译器可以假定
int *
从未指向float
,则它可以避免通过int
指针进行赋值后重新加载float
数据,因为这些赋值无法更改float
数据。)某些编译器具有开关,因此您可以可以启用或禁用别名支持。
Since aliasing can break your program, you should understand the rules for it, avoid it when not needed, and know how to enable it when needed. 由于别名会破坏您的程序,因此您应该了解它的规则,在不需要时避免使用它,并知道如何在需要时启用它。 In this case, aliasing is not needed to examine the results of reinterpreting the bytes of a string as an
int
. 在这种情况下,不需要别名来检查将字符串的字节重新解释为
int
。 A safe way to do this is to copy the bytes into an int
, as with: 一种安全的方法是将字节复制到
int
,如下所示:
char *p = "01234567890123456789";
int i;
memcpy(&i, p, sizeof i);
printf("%d\n", i);
This is the result of ((51×256+50)×256+49)×256+48
, where 51 is ASCII code of '3' and 50 is ASCII code of '2' and so on. 这是
((51×256+50)×256+49)×256+48
,其中51是ASCII代码“ 3”,而50是ASCII代码“ 2”,依此类推。 In fact, pp[0]
points to 4 bytes of memory (int is 4 bytes), and those 4 bytes are "0123", and int on your machine is little-endian, so '0' (which is 48 in numeric) is LSB and '3' is MSB. 实际上,
pp[0]
指向4个字节的内存(int为4个字节),而这4个字节为“ 0123”,并且您的计算机上的int为低位字节序,因此为“ 0”(数字为48)是LSB,“ 3”是MSB。
p[1]
is one byte after p[0]
because p
is a pointer to byte array, but pp[1]
is 4 bytes after pp[0]
because pp
is a pointer to int array and int is 4 bytes. p[1]
是p[0]
之后的一个字节,因为p
是指向字节数组的指针,但是pp[1]
是pp[0]
之后的4个字节,因为pp
是指向int数组的指针,而int是4个字节。
858927408
when converted to hex is 0x33323130
858927408
转换为十六进制时为0x33323130
Apperently on your system, you have a little-endian format. 在您的系统上,您显然有一个小端格式。 In this format the LSB of the integer is stored first.
以这种格式,整数的LSB首先存储。
The first 4 bytes of the string are taken for the integer. 字符串的前4个字节取整数。
"0123"
The ascii values are 0x30, 0x31, 0x32, 0x33
respectively. "0123"
的ascii值分别为0x30, 0x31, 0x32, 0x33
。 Since this is little-endian. 由于这是小端。 The LSByte of the integer is
0x30
and the MSbyte of the integer is 0x33
. 整数的LSByte为
0x30
,整数的MSbyte为0x33
。
That is how you get 0x33323130
as an output. 这就是您获得
0x33323130
作为输出的方式。
Edit Regarding the additional question from OP 编辑关于OP中的其他问题
And if i write printf("%c",pp[0]);
如果我写printf(“%c”,pp [0]); then output is 0 which is correct but when I change pp[0] to pp[1] then output is 4 but how ?
然后输出是0,这是正确的,但是当我将pp [0]更改为pp [1]时,输出是4,但是如何?
When you have %c
in printf
and give an integer parameter, you are converting the integer to a character ie, the LS byte is taken 0x30
and this is printed as ASCII. 当在
printf
有%c
并提供整数参数时,您正在将整数转换为字符,即LS字节为0x30
并以ASCII形式打印。
for pp[1]
this is the next integer in the array, which is 4 bytes later. 对于
pp[1]
这是数组中的下一个整数,后4个字节。 So the LS Byte in this case will be 0x34
and 4
is printed after conversion to ASCII. 因此,在这种情况下,LS字节将为
0x34
并在转换为ASCII后打印4
。
It just sets the start address of the int
object at the beginning of the string. 它只是将
int
对象的起始地址设置在字符串的开头。 The actual value of the int
will depend on endianess and sizeof(int). int
的实际值将取决于字节序和sizeof(int)。
as "01234567890123456789"
is {0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39 ...}
in memory if the endianess are little and sizeof(int) == 4
the value will be 0x0x33323130
. 内存中的
"01234567890123456789"
为{0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39 ...}
,如果内存占用率很小且sizeof(int) == 4
该值为0x0x33323130
。 I the endianess are big the value will be 0x30313233
我的
0x30313233
很大,值将是0x30313233
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.