简体   繁体   中英

Unable to understand pointers in C and typecasting

I am unable to understand why the 3rd and 4th printf 's are giving 54 and -61. According to me, the program should have given 0 as output because character pointer is expected to display output value up to (sizeof(char) * 8) bits and 54 in binary is 00000000 00110110 .

#include<stdio.h>
void main()
{
      int i=54;
      float a=3.14;
      char *ii,*aa;

      ii=(char *)&i;
      aa=(char *)&a;

      printf("%u\n",ii);
      printf("%u\n",aa);
      printf("%d\n",*ii);
      printf("%d\n",*aa);

}

Edit: The fourth printf (if I use %f there, I typed %d by mistake) is giving 0.00000 . Why?

Why is the third output 54?

Your third output displays 54, because on your machine,

int i=54;

is stored in memory like this:

36 00 00 00

your pointer points here:

36 00 00 00
^^

And thus when you print out that 0x36 as a char (a one byte long integral type), you see 54.

This storage format is called " little endian ", and is used on x86 and amd64 processors, which are quite common.

Note that the language does not guarantee that integers are stored this way; you may very well get a different result with a different machine or compiler. Don't depend on it.

What about the float?

The float works similarly, but is much more complicated to show. Again, it's quite machine dependent. For an amd64, if you encode 3.14 in an IEEE single (this is platform dependent), and then store the four bytes backwards (at least, I believe amd64 stores them "little endian", though I'm not sure why, since it's a float.¹), the byte value in the first slot, when looked at as a signed 8-bit two's complement integer (this is also platform dependent), should work out to the value you're seeing.

Last, you say:

i didn't know about little edian. but is that not with float. it is giving 0.000000000 if i use %f in place of %d in fourth (by mistake i typed %d here)

I'm going to assume you mean:

printf("%f\n",*aa);

And that aa is still a char * . This isn't well-formed: for %f , you need to pass a double or a float . However, let's plow on, and attempt to explain this (undefined!) behavior.

Since it's a char * , when you dereference it, on your machine, it'll likely read some one-byte value. 3.14 , as a little endian float, is:

c3 f5 48 40
^^

0xc3 , as a two's complement signed one byte integer, is -61, which explains your question. Thus, for your program *aa is -61. When you pass this to printf , it'll be promoted to an int , because printf is a "varargs" (variable number of arguments) function. You can see this when compiling in some compilers:

prog1.c:14:7: warning: format '%f' expects argument of type 'double', but argument 2 has type 'int' [-Wformat]

Thus, an "int" will get passed to printf in whatever manner your platform uses. Let's investigate that. For explicitness, I'm compiling the following:

#include<stdio.h>
int main()
{
    int i=54;
    float a=3.14;
    char *ii,*aa;

    ii=(char *)&i;
    aa=(char *)&a;

    printf("%u\n",ii);
    printf("%u\n",aa);
    printf("%d\n",*ii);
    printf("%f\n",*aa);

    return 0;
}

I do:

% gcc -g -o prog1 prog1.c
prog1.c: In function ‘main’:
prog1.c:11:2: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘char *’ [-Wformat]
prog1.c:12:2: warning: format ‘%u’ expects argument of type ‘unsigned int’, but argument 2 has type ‘char *’ [-Wformat]
prog1.c:14:2: warning: format ‘%f’ expects argument of type ‘double’, but argument 2 has type ‘int’ [-Wformat]

(In case it isn't clear: gcc is throwing really good warnings here: it's pointing out undefined behavior — bugs — in your program. You should always fix these. We're going to ignore them to investigate, but note that the compiler can really do whatever it wants at this point, so everything below is anything but guaranteed.)

Then, let's start this is a debugger, and stop on that last printf. For me, that's line 14. Thus:

% gdb prog1
GNU gdb (Gentoo 7.6.2 p1) 7.6.2
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-pc-linux-gnu".
For bug reporting instructions, please see:
<http://bugs.gentoo.org/>...
Reading symbols from /home/me/code/random/prog1...done.
(gdb) break prog1.c:14
Breakpoint 1 at 0x4005db: file prog1.c, line 14.

Let's run it up to that breakpoint.

(gdb) r
Starting program: /home/me/code/random/prog1 
warning: Could not load shared library symbols for linux-vdso.so.1.
Do you need "set solib-search-path" or "set sysroot"?
4294959628
4294959624
54

Breakpoint 1, main () at prog1.c:14
14      printf("%f\n",*aa);

Now we're stopped on the " printf ", but what does that mean? Let's look at some assembler!

(gdb) disassemble
Dump of assembler code for function main:
   0x000000000040056c <+0>: push   %rbp
   0x000000000040056d <+1>: mov    %rsp,%rbp
   0x0000000000400570 <+4>: sub    $0x20,%rsp
   0x0000000000400574 <+8>: movl   $0x36,-0x14(%rbp)
   0x000000000040057b <+15>:    mov    0x12f(%rip),%eax        # 0x4006b0
   0x0000000000400581 <+21>:    mov    %eax,-0x18(%rbp)
   0x0000000000400584 <+24>:    lea    -0x14(%rbp),%rax
   0x0000000000400588 <+28>:    mov    %rax,-0x8(%rbp)
   0x000000000040058c <+32>:    lea    -0x18(%rbp),%rax
   0x0000000000400590 <+36>:    mov    %rax,-0x10(%rbp)
   0x0000000000400594 <+40>:    mov    -0x8(%rbp),%rax
   0x0000000000400598 <+44>:    mov    %rax,%rsi
   0x000000000040059b <+47>:    mov    $0x4006a4,%edi
   0x00000000004005a0 <+52>:    mov    $0x0,%eax
   0x00000000004005a5 <+57>:    callq  0x400450 <printf@plt>
   0x00000000004005aa <+62>:    mov    -0x10(%rbp),%rax
   0x00000000004005ae <+66>:    mov    %rax,%rsi
   0x00000000004005b1 <+69>:    mov    $0x4006a4,%edi
   0x00000000004005b6 <+74>:    mov    $0x0,%eax
   0x00000000004005bb <+79>:    callq  0x400450 <printf@plt>
   0x00000000004005c0 <+84>:    mov    -0x8(%rbp),%rax
   0x00000000004005c4 <+88>:    movzbl (%rax),%eax
   0x00000000004005c7 <+91>:    movsbl %al,%eax
   0x00000000004005ca <+94>:    mov    %eax,%esi
   0x00000000004005cc <+96>:    mov    $0x4006a8,%edi
   0x00000000004005d1 <+101>:   mov    $0x0,%eax
   0x00000000004005d6 <+106>:   callq  0x400450 <printf@plt>
=> 0x00000000004005db <+111>:   mov    -0x10(%rbp),%rax
   0x00000000004005df <+115>:   movzbl (%rax),%eax
   0x00000000004005e2 <+118>:   movsbl %al,%eax
   0x00000000004005e5 <+121>:   mov    %eax,%esi
   0x00000000004005e7 <+123>:   mov    $0x4006ac,%edi
   0x00000000004005ec <+128>:   mov    $0x0,%eax
   0x00000000004005f1 <+133>:   callq  0x400450 <printf@plt>
   0x00000000004005f6 <+138>:   mov    $0x0,%eax
   0x00000000004005fb <+143>:   leaveq 
   0x00000000004005fc <+144>:   retq   

That's main , and the arrow ( => ) is where we are. The call instruction at 0x00000000004005f1 is the call to your fourth printf , and as you can see, there's some setup required to call it: all those mov instructions. Since they set up the call, and what we're interested in is what get's passed to printf , we'll need to let them run, so we need to step the program up to just right at that call instruction. We can do this with another breakpoint:

(gdb) break *0x00000000004005f1
Breakpoint 2 at 0x4005f1: file prog1.c, line 14.
(gdb) continue
Continuing.

Breakpoint 2, 0x00000000004005f1 in main () at prog1.c:14
14      printf("%f\n",*aa);

Now we're at that call statement. Now, because I'm on an amd64 chip (an Intel Core i7. These are also sometimes referred to x86-64.) and I'm not running Windows, for me, we call a function by putting the arguments, from left to right, into certain registers. From the right, the first argument is *aa , which remember, we've established to be -61. We can dump our registers with:

(gdb) info all-registers
rax            0x0  0
rbx            0x0  0
rcx            0x2  2
rdx            0x7ffff7dd7820   140737351874592
rsi            0xffffffc3   4294967235
rdi            0x4006ac 4196012
rbp            0x7fffffffe220   0x7fffffffe220
rsp            0x7fffffffe1f8   0x7fffffffe1f8
r8             0x2  2
r9             0x7ffff7dd4640   140737351861824
r10            0x7fffffffe0d8   140737488347352
r11            0x246    582
r12            0x400480 4195456
r13            0x7fffffffe300   140737488347904

[ snip … ]

ymm0           {v8_float = {0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, v4_double = {0x0, 0x0, 0x0, 0x0}, v32_int8 = {0x0, 0x0, 0x0, 0x0, 0xff, 0x0, 0x0, 0x0, 
    0xff, 0x0, 0x0, 0x0, 0xff, 0x0 <repeats 19 times>}, v16_int16 = {0x0, 0x0, 0xff, 0x0, 0xff, 0x0, 0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0}, 
  v8_int32 = {0x0, 0xff, 0xff, 0xff, 0x0, 0x0, 0x0, 0x0}, v4_int64 = {0xff00000000, 0xff000000ff, 0x0, 0x0}, v2_int128 = {0x000000ff000000ff000000ff00000000, 
    0x00000000000000000000000000000000}}

Since -61 is an integer, it ends up in an integer register, here, we can see that it's in rsi . (It's been sign extended, which is why it's 0xffffffc3 : -61 in 4 bytes, instead of one.) However, %f , being a float, will most likely read a floating point register, such as ymm0 on my machine. It happens to be zero. That doesn't need to be true, since this is undefined behavior, but, it is, and thus, we'll get zero.

¹This isn't one of those things you care about often, except for morbid curiosity.
²The only part I can't explain is why our integer ended up in rsi . I feel like it should have been in rdi . Like I said, morbid curiosity. ( Edit: Ugh, curse my curiosity. It ends up in rdi because rdi is used for the second argument, and it's the second argument. Wikipedia has it labelled as "right to left", but that only applies to stuff on the stack: registers are assigned left to right.)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM