简体   繁体   English

C编程中位数问题

[英]Question in number of bits in C programming

If I do如果我做

int a = 3 , then 3 will be represented in binary with 32 bits. int a = 3 ,则 3 将用 32 位二进制表示。

If I do如果我做

char a = 3 , then 3 will be represented in binary with 8 bits. char a = 3 ,则 3 将用 8 位二进制表示。

My question is before doing the initialization with the value, how many bits does 3 get represented with?我的问题是在使用值进行初始化之前, 3代表多少位?

(In other words, how many bits does the "3" has on the right-hand side of the equal sign) (换句话说,“3”在等号右边有多少位)

It's very common that int has 32 bit, but it's not guaranteed. int具有 32 位是很常见的,但不能保证。 It can be 16 or 64 too.它也可以是 16 或 64。 Or higher.或更高。

A single 3 is an integer literal of type int .单个3int类型的 integer 文字。

You can check it using sizeof operator.您可以使用sizeof运算符检查它。 It will give you size of argument in bytes.它会给你以字节为单位的参数大小。 Just try to get size of int , a and 3 .只需尝试获取inta3的大小。

#include <stdio.h>
int main()
{
    int a = 3;
    printf("%ld\n", sizeof(a)); // gives 4 bytes (32 bit) on my PC 
    printf("%ld\n", sizeof(int)); // gives 4 bytes (32 bit) on my PC 
    printf("%ld\n", sizeof(3)); // gives 4 bytes (32 bit) on my PC 

    return 0;
}

Also, 3 has type of int .此外, 3的类型为int So its size is equal to size of int .所以它的大小等于int的大小。

The size of an object of the type int is implementation defined The standard makes only the requirement that INT_MAX shall not be less than +32767 that is 2 ^ 15 - 1 int类型的 object 的大小是实现定义的 该标准仅要求INT_MAX不得小于+32767 ,即2 ^ 15 - 1

If in your system the size of an object of the type int is equal to 4 then an integer constant like 3 will occupy a block of memory equal to 4 bytes.如果在您的系统中, int类型的 object 的大小等于 4,那么像3这样的 integer 常量将占用一个等于4个字节的 memory 块。

Pay attention to that for example character integer constant are also have the type int .请注意,例如字符 integer 常量也具有int类型。

So in the both these declarations所以在这两个声明中

char a = 3;

and

char a = '\3';

the constants 3 and '\3' having the type int occupy 4 bytes if sizeof( int ) is equal to 4 .如果sizeof( int )等于4 ,则具有int类型的常量3'\3'占用4个字节。

The 3 is called an integer constant and it has a type much like any named variable. 3被称为integer 常量,它的类型很像任何命名变量。 It is always type int if the number typed can fit inside an int .如果键入的数字可以放入int int Otherwise, if it can't fit, the compiler will try to fit it inside a long , then long long .否则,如果它不适合,编译器会尝试将它放入一个long中,然后是long long

There's various rather intricate rules for how this is done, I won't mention all the dirty details here - those who are interested in that can check the tables in the C standard 6.4.4.1.有各种相当复杂的规则来说明如何做到这一点,我不会在这里提及所有肮脏的细节——对此感兴趣的人可以查看 C 标准 6.4.4.1 中的表格。 For the average programmer it is probably enough to know that we can also enforce the integer constant to be unsigned by adding a U suffix or force it to be long by adding a L suffix.对于普通程序员来说,知道我们还可以通过添加U后缀来强制 integer 常量无符号或通过添加L后缀强制它变long可能就足够了。 That is 3U or 3L or a combination 3UL .3U3L或组合3UL (Lower case u and l works too.) (小写的ul也可以。)

On real-world computers, int is always 2 or 4 bytes large.在现实世界的计算机上, int总是 2 或 4 字节大。 long is either 4 or 8 bytes large. long是 4 或 8 字节大。 Example from a 64 bit Linux computer with 4 byte int and 8 byte long :来自具有 4 字节int和 8 字节long的 64 位 Linux 计算机的示例:

#include <stdio.h>
  
int main (void)
{
  printf("%zu\n", sizeof(int));        // 4
  printf("%zu\n", sizeof(3));          // 4
  printf("%zu\n", sizeof(3L));         // 8
  printf("%zu\n", sizeof(2147483647)); // 4, fits int
  printf("%zu\n", sizeof(2147483648)); // 8, doesnt fit
}

https://godbolt.org/z/3675zv https://godbolt.org/z/3675zv

The question "how many bits does 3 get represented with?"问题“3 代表多少位?” is actually tricky.实际上很棘手。 If we can find the 3 then we can answer it.如果我们能找到3,那么我们就可以回答它。 So the question is: where is the 3 ?所以问题是: 3 在哪里

What really happens is that:真正发生的是:

int a = 3;

is the same as:是相同的:

int a;
a = 3;

The compiler makes sure there will be 4 bytes of space for the variable a (it does this at compile time), and then it also puts an instruction in the program which stores the number 3 in that space when you run the program.编译器确保变量a将有 4 个字节的空间(它在编译时执行此操作),然后它还会在程序中放置一条指令,当您运行程序时将数字 3 存储在该空间中。

We can use this useful online tool to compile a program and see what assembly/machine code the compiler actually outputs: https://godbolt.org/z/a9Pohn我们可以使用这个有用的在线工具来编译程序并查看编译器实际输出的汇编/机器代码: https://godbolt.org/z/a9Pohn

In this case, I entered the program:在这种情况下,我进入了程序:

int main() {
    int a;
    a = 3;
}

and compiled it with "x86-64 gcc 10.2", no optimizations.并用“x86-64 gcc 10.2”编译它,没有优化。 Here is the compiled code (both assembly and machine code):这是编译后的代码(汇编代码和机器代码):

main:
 55                     push rbp
 48 89 e5               mov rbp,rsp
 c7 45 fc 03 00 00 00   mov DWORD PTR [rbp-0x4],0x3
 b8 00 00 00 00         mov eax,0x0
 5d                     pop rbp
 c3                     ret 

If we can read assembly we can see that the instruction the compiler chose to insert into the program, to initialize the variable a , was mov DWORD PTR [rbp-0x4],0x3 .如果我们可以阅读汇编,我们可以看到编译器选择插入到程序中来初始化变量a的指令是mov DWORD PTR [rbp-0x4],0x3 And in machine code it is written c7 45 fc 03 00 00 00 .在机器代码中,它写成c7 45 fc 03 00 00 00 The instruction is where the number 3 comes from.指令是数字 3 的来源。

The instruction is 7 bytes long.该指令长 7 个字节。 c7 45 tell the CPU what kind of instruction this is ("put a specific number at a specific position in the stack frame"). c7 45告诉 CPU 这是什么类型的指令(“在堆栈帧中的特定 position 中放入特定数字”)。 fc is the position in the stack frame. fc是堆栈帧中的 position。 And 03 00 00 00 is the specific number which it puts there (in little-endian format). 03 00 00 00是它放在那里的特定数字(以小端格式)。 This is the number 3 in the source code.这是源代码中的数字 3。 So in this case, it takes up 4 bytes.所以在这种情况下,它占用了 4 个字节。


Note that it's not always the same.请注意,它并不总是相同的。 If we compile for an ARM CPU instead of x86-64, then these are the relevant instructions如果我们为 ARM CPU 而不是 x86-64 编译,那么这些是相关指令

mov r3, #3
str r3, [fp, #-8]

Unfortunately godbolt won't show us machine code, but we can look up the MOV # instruction in the ARM manual which tells us that the instruction is 4 bytes long, and the number being moved only takes up 2 of those bytes.不幸的是,godbolt 不会向我们显示机器代码,但我们可以在 ARM 手册中查找 MOV # 指令,该指令告诉我们该指令长 4 个字节,而被移动的数字仅占用其中 2 个字节。 The other bits are automatically zeroes, If you use a number that doesn't fit in 2 bytes.如果您使用的数字不适合 2 个字节,则其他位自动为零。 obviously it uses a different instruction.显然它使用了不同的指令。

Usually we don't talk about the sizes of instructions since they vary a lot more than data sizes.通常我们不会谈论指令的大小,因为它们的变化比数据大小要大得多。 int a; always reserves 4 bytes (if your system's int is 4 bytes) but the instruction which puts the specific bits into that space can have varying sizes.总是保留 4 个字节(如果您的系统的int是 4 个字节),但是将特定位放入该空间的指令可以具有不同的大小。


Even on x86-64, numbers can take up different amounts of space.即使在 x86-64 上,数字也可以占用不同数量的空间。 If I do return 0;如果我确实return 0; , the compiler translates that to xor eax, eax ( 31 c0 ). ,编译器将其转换为xor eax, eax ( 31 c0 )。 There is no number 0 in that instruction at all!该指令中根本没有数字0 ( 31 c0 is all type of instruction, no data) 31 c0是所有类型的指令,没有数据)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM