简体   繁体   中英

Question in number of bits in C programming

If I do

int a = 3 , then 3 will be represented in binary with 32 bits.

If I do

char a = 3 , then 3 will be represented in binary with 8 bits.

My question is before doing the initialization with the value, how many bits does 3 get represented with?

(In other words, how many bits does the "3" has on the right-hand side of the equal sign)

It's very common that int has 32 bit, but it's not guaranteed. It can be 16 or 64 too. Or higher.

A single 3 is an integer literal of type int .

You can check it using sizeof operator. It will give you size of argument in bytes. Just try to get size of int , a and 3 .

#include <stdio.h>
int main()
{
    int a = 3;
    printf("%ld\n", sizeof(a)); // gives 4 bytes (32 bit) on my PC 
    printf("%ld\n", sizeof(int)); // gives 4 bytes (32 bit) on my PC 
    printf("%ld\n", sizeof(3)); // gives 4 bytes (32 bit) on my PC 

    return 0;
}

Also, 3 has type of int . So its size is equal to size of int .

The size of an object of the type int is implementation defined The standard makes only the requirement that INT_MAX shall not be less than +32767 that is 2 ^ 15 - 1

If in your system the size of an object of the type int is equal to 4 then an integer constant like 3 will occupy a block of memory equal to 4 bytes.

Pay attention to that for example character integer constant are also have the type int .

So in the both these declarations

char a = 3;

and

char a = '\3';

the constants 3 and '\3' having the type int occupy 4 bytes if sizeof( int ) is equal to 4 .

The 3 is called an integer constant and it has a type much like any named variable. It is always type int if the number typed can fit inside an int . Otherwise, if it can't fit, the compiler will try to fit it inside a long , then long long .

There's various rather intricate rules for how this is done, I won't mention all the dirty details here - those who are interested in that can check the tables in the C standard 6.4.4.1. For the average programmer it is probably enough to know that we can also enforce the integer constant to be unsigned by adding a U suffix or force it to be long by adding a L suffix. That is 3U or 3L or a combination 3UL . (Lower case u and l works too.)

On real-world computers, int is always 2 or 4 bytes large. long is either 4 or 8 bytes large. Example from a 64 bit Linux computer with 4 byte int and 8 byte long :

#include <stdio.h>
  
int main (void)
{
  printf("%zu\n", sizeof(int));        // 4
  printf("%zu\n", sizeof(3));          // 4
  printf("%zu\n", sizeof(3L));         // 8
  printf("%zu\n", sizeof(2147483647)); // 4, fits int
  printf("%zu\n", sizeof(2147483648)); // 8, doesnt fit
}

https://godbolt.org/z/3675zv

The question "how many bits does 3 get represented with?" is actually tricky. If we can find the 3 then we can answer it. So the question is: where is the 3 ?

What really happens is that:

int a = 3;

is the same as:

int a;
a = 3;

The compiler makes sure there will be 4 bytes of space for the variable a (it does this at compile time), and then it also puts an instruction in the program which stores the number 3 in that space when you run the program.

We can use this useful online tool to compile a program and see what assembly/machine code the compiler actually outputs: https://godbolt.org/z/a9Pohn

In this case, I entered the program:

int main() {
    int a;
    a = 3;
}

and compiled it with "x86-64 gcc 10.2", no optimizations. Here is the compiled code (both assembly and machine code):

main:
 55                     push rbp
 48 89 e5               mov rbp,rsp
 c7 45 fc 03 00 00 00   mov DWORD PTR [rbp-0x4],0x3
 b8 00 00 00 00         mov eax,0x0
 5d                     pop rbp
 c3                     ret 

If we can read assembly we can see that the instruction the compiler chose to insert into the program, to initialize the variable a , was mov DWORD PTR [rbp-0x4],0x3 . And in machine code it is written c7 45 fc 03 00 00 00 . The instruction is where the number 3 comes from.

The instruction is 7 bytes long. c7 45 tell the CPU what kind of instruction this is ("put a specific number at a specific position in the stack frame"). fc is the position in the stack frame. And 03 00 00 00 is the specific number which it puts there (in little-endian format). This is the number 3 in the source code. So in this case, it takes up 4 bytes.


Note that it's not always the same. If we compile for an ARM CPU instead of x86-64, then these are the relevant instructions

mov r3, #3
str r3, [fp, #-8]

Unfortunately godbolt won't show us machine code, but we can look up the MOV # instruction in the ARM manual which tells us that the instruction is 4 bytes long, and the number being moved only takes up 2 of those bytes. The other bits are automatically zeroes, If you use a number that doesn't fit in 2 bytes. obviously it uses a different instruction.

Usually we don't talk about the sizes of instructions since they vary a lot more than data sizes. int a; always reserves 4 bytes (if your system's int is 4 bytes) but the instruction which puts the specific bits into that space can have varying sizes.


Even on x86-64, numbers can take up different amounts of space. If I do return 0; , the compiler translates that to xor eax, eax ( 31 c0 ). There is no number 0 in that instruction at all! ( 31 c0 is all type of instruction, no data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM