简体   繁体   English

使用格式说明符进行转换

[英]Use of format specifiers for conversions

I am unable to deduce the internal happenings inside the machine when we print data using format specifiers. 当我们使用格式说明符打印数据时,我无法推断出机器内部的内部情况

I was trying to understand the concept of signed and unsigned integers and the found the following: 我试图理解有符号和无符号整数的概念,发现以下内容:

unsigned int b=-12;  
printf("%d\n",b);     //prints -12
printf("%u\n\n",b);   //prints 4294967284

I am guessing that b actually stores the binary version of -12 as 11111111111111111111111111110100. 我猜测b实际上将-12的二进制版本存储为11111111111111111111111111110100。

So, since b is unsigned , b technically stores 4294967284. But still the format specifier %d causes the binary value of b to be printed as its signed version i,e, -12. 因此,由于b是无符号的,因此b在技术上存储4294967284。但是格式说明符%d仍然导致b的二进制值作为其有符号版本i,e -12打印。

However, 然而,

printf("%f\n",2);    //prints 0.000000
printf("%f\n",100);   //prints 0.000000
printf("%d\n",3.2);    //prints 2147483639

printf("%d\n",3.1);    //prints 2147483637

I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms. 我有点期望按照类型转换规范将2打印为2.00000,将3.2打印为3。

Why does this not happen and what exactly takes place at machine level ? 为什么没有发生这种情况,以及在机器级别发生了什么?

Mismatching format specifier and argument type (like using the floating point specifier "%f" to print an int value) leads to undefined behavior . 格式说明符和参数类型不匹配(例如使用浮点说明符"%f"打印int值)会导致未定义的行为

Remember that 2 is an integer value, and vararg functions (like printf ) doesn't really know the types of the arguments. 请记住, 2数值,而vararg函数(例如printf )并不真正知道参数的类型。 The printf function have to rely on the format specifier to assume the argument is of the specified type. printf函数必须依靠格式说明符来假定参数为指定的类型。


To better understand how you get the results you get, to understand "the internal happenings", we first must make two assumptions: 为了更好地了解您如何获得结果,了解“内部事件”,我们首先必须做出两个假设:

  • The system uses 32 bits for the int type 系统将32位用于int类型
  • The system uses 64 bits for the double type 系统将64位用于double

Now what happens with 现在发生了什么

printf("%f\n",2);    //prints 0.000000

is that the printf function sees the "%f" specifier, and fetch the next argument as a 64-bit double value. printf函数看到"%f"说明符,并获取下一个参数作为64位double值。 Since the int value you provided in the argument list is only 32 bits, half of the bits in the double value will be unknown. 由于您在参数列表中提供的int值只有32位,因此double值中的一半位将是未知的。 The printf function will then print the (invalid) double value. 然后, printf函数将打印(无效) double值。 If you're unlucky some of the unknown bits might lead the value to be a trap value which can cause a crash. 如果您不走运,某些未知位可能会导致该值成为陷阱值 ,从而可能导致崩溃。

Similarly with

printf("%d\n",3.2);    //prints 2147483639

the printf function fetches the next argument as a 32-bit int value, losing half of the bits in the 64-bit double value provided as the actual argument. printf函数以32位int值的形式获取下一个参数,而丢失作为实际参数提供的64位double值中的一半位。 Exactly which 32 bits are copied into the internal int value depends on endianness . 究竟将哪32位复制到内部int值取决于字节序 Integers don't have trap values so no crashes happens, just an unexpected value will be printed. 整数没有陷阱值,因此不会发生崩溃,只会打印出意外的值。

what exactly takes place at machine level ? 在机器级别究竟发生了什么?

The stdio.h functions are quite far from the machine level. stdio.h函数与计算机级别相差很远。 They provide a standardized abstraction layer on top of various OS API. 它们在各种OS API之上提供了标准化的抽象层。 Whereas "machine level" would refer to the generated assembler. 而“机器级别”将指代生成的汇编器。 The behavior you experience is mostly related to details of the C language rather than the machine. 您遇到的行为主要与C语言而不是机器有关。

On the machine level, there exists no signed numbers, but everything is treated as raw binary data. 在计算机级别上,不存在带符号的数字,但是所有内容都被视为原始二进制数据。 The compiler can turn raw binary data into a signed number by using an instruction that tells the CPU: "use what's stored at this location and treat it as a signed number". 编译器可以通过使用一条告诉CPU的指令将原始二进制数据转换为带符号的数字:“使用此位置存储的内容并将其视为带符号的数字”。 Specifically, as a two's complement signed number on all common computers. 具体来说,在所有普通计算机上作为二进制补码签名。 But this is irrelevant when explaining why your code misbehaves. 但这在解释代码错误行为的原因时是无关紧要的。

The integer constant 12 is of type int . 整数常量12的类型为int When we write -12 we apply the unary - operator on that. 当我们写-12我们应用一元-对运营商。 The result is still of type int but now of value -12 . 结果仍然是int类型,但现在值为-12

Then you attempt to store this negative number in an unsigned int . 然后,您尝试将此负数存储在unsigned int This triggers an implicit conversion to unsigned int , which should be carried out according to the C standard: 这将触发对unsigned int的隐式转换,应根据C标准执行此转换:

Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or subtracting one more than the maximum value that can be represented in the new type until the value is in the range of the new type 否则,如果新类型是无符号的,则通过重复添加或减去比新类型可表示的最大值多一个值来转换该值,直到该值在新类型的范围内为止

The maximum value of a 32 bit unsigned int is 2^32 - 1 , which equals 4.29*10^9 - 1 . 32位无符号int的最大值是2^32 - 1 4.29*10^9 - 1 ,等于4.29*10^9 - 1 "One more than the maximum" gives 4.29*10^9 . “大于最大值”给出4.29*10^9 If we calculate -12 + 4.29*10^9 we get 4294967284 . 如果我们计算-12 + 4.29*10^9我们得到4294967284 This is in range of an unsigned int and is the result you see later. 这是一个无符号整数的范围,这是您稍后看到的结果。

Now as it happens, the printf family of functions is very unsafe. 现在,printf函数家族非常不安全。 If you provide a wrong format specifier which doesn't matches the type, they might crash or display the wrong result etc - the program invokes undefined behavior. 如果提供的错误格式说明符与类型不匹配,则它们可能会崩溃或显示错误的结果,等等-程序将调用未定义的行为。

So when you use %d or %i reserved for signed int, but pass an unsigned int, anything can happen. 因此,当您使用%d%i保留用于有符号的int,但传递无符号的int时,任何事情都会发生。 "Anything" includes the compiler trying to convert the passed type to match the passed format specifier. “任何内容”都包括编译器试图转换传递的类型以匹配传递的格式说明符。 That's what happened when you used %d . 这就是您使用%d时发生的情况。

When you pass values of types completely mismatching the format specifier, the program just prints gibberish though. 当您传递完全不匹配格式说明符的类型的值时,该程序只是打印出乱码。 Because you are still invoking undefined behavior. 因为您仍在调用未定义的行为。

I kind of expected the 2 to be printed as 2.00000 and 3.2 to be printed as 3 as per type conversion norms. 我有点期望按照类型转换规范将2打印为2.00000,将3.2打印为3。

The reason why the printf family can't do anything intelligent like assuming that 2 should be converted to 2.0 , is because they are variadic (variable argument) functions. printf系列之所以不能做任何聪明的事情,例如假设2应该转换为2.0 ,是因为它们是可变参数(可变参数)。 Meaning they can take any number of arguments. 意味着他们可以接受任意数量的参数。 In order to make that possible, the parameters are essentially passed as raw binary through something called va_list, and all type information is lost. 为了使之成为可能,参数实际上是作为原始二进制文件通过称为va_list的东西传递的,所有类型信息都将丢失。 The printf implementation is therefore left with no type information but the format string you gave it. 因此,printf实现中没有类型信息,只有给您的格式字符串。 This is why variadic functions are so unsafe to use. 这就是为什么可变参数函数使用起来如此不安全的原因。

Unlike a regular function which has more type safety - if you declare void foo (float f) and pass the integer constant 2 (type int ), it will attempt to implicitly convert from integer to float, and perhaps also give a conversion warning. 与具有更多类型安全性的常规函数​​不同-如果您声明void foo (float f)并传递整数常量2int类型),它将尝试从整数隐式转换为float,并可能还会发出转换警告。

The behaviors you observe are the result of printf interpreting the bits given to it as the type specified by the format specifier. 您观察到的行为是printf将赋予它的位解释为格式说明符指定的类型的结果。 In particular, at least for your system: 特别是,至少对于您的系统:

  • The bits for an int argument and an unsigned argument in the same position within the argument list would be passed in the same place, so when you give printf one and tell it to format the other, it uses the bits you give it as if they were the bits of the other. 参数列表中同一位置的int参数和unsigned参数的位将在同一位置传递,因此,当您给printf一个位并告诉它格式化另一个位时,它将使用您给它的位,就好像它们是另一半。
  • The bits for an int argument and a double argument would be passed in different places—possibly a general register for the int argument and a special floating-point register for the double argument, so when you give printf one and tell it to format the other, it does not get the bits for the double to use for the int ; 一个int参数和一个double参数的位将在不同的地方传递-可能是一个int参数的通用寄存器,一个用于double参数的特殊浮点寄存器,所以当您给printf一个并告诉它格式化另一个格式时,它没有获得用于intdouble的位; it gets completely unrelated bits that were left lying around by previous operations. 它获得了之前操作遗留下来的完全无关的位。

Whenever a function is called, values for its arguments must be placed in certain places. 每当调用函数时,其参数的值都必须放在某些位置。 These places vary according to the software and hardware used, and they vary by the type and number of arguments. 这些位置随所使用的软件和硬件而异,并且随参数的类型和数量而异。 However, for any particular argument type, argument position, and specific software and hardware used, there is a specific place (or combination of places) where the bits of that argument should be stored to be passed to the function. 但是,对于任何特定的参数类型,参数位置以及使用的特定软件和硬件,都有一个特定的位置(或位置组合),该参数的位应存储在该位置以传递给函数。 The rules for this are part of the Application Binary Interface (ABI) for the software and hardware being used. 规则是所用软件和硬件的应用程序二进制接口(ABI)的一部分。

First, let us neglect any compiler optimization or transformation and examine what happens when the compiler implements a function call in source code directly as a function call in assembly language. 首先,让我们忽略编译器的任何优化或转换,并检查当编译器直接将源代码中的函数调用实现为汇编语言中的函数调用时会发生什么情况。 The compiler will take the arguments you provide for printf and write them to the places designated for those types of arguments . 编译器将采用您为printf提供的参数,并将其写入为这些类型的参数指定的位置。 When printf executes, it examines the format string. printf执行时,它检查格式字符串。 When it sees a format specifier, it figures out what type of argument it should have, and it looks for the value of that argument in the place for that type of argument . 当看到格式说明符时,它将找出应具有的参数类型,并在该参数类型的位置查找该参数的值

Now, there are two things that can happen. 现在,可能发生两件事。 Say you passed an unsigned but used a format specifier for int , like %d . 假设您传递了一个unsigned但为int使用了格式说明符,如%d In every ABI I have seen, an unsigned and an int argument (in the same position within the list of arguments) are passed in the same place. 在我所看到的每个ABI中,一个unsigned和一个int参数(在参数列表中的相同位置)都在同一位置传递。 So, when printf looks for the bits for the int it is expected, it will get the bits for the unsigned you passed. 因此,当printf寻找期望的int位时,它将获得您传递的unsigned位。

Then printf will interpret those bits as if they encoded the value for an int , and it will print the results. 然后printf将解释这些位,就好像它们对int的值进行了编码一样,它将打印结果。 In other words, the bits of your unsigned value are reinterpreted as the bits of an int . 换句话说, unsigned值的位将重新解释为int的位。 1 1个

This explains why you see “-12” when you pass the unsigned value 4,294,967,284 to printf to be formatted with %d . 这解释了为什么在将unsigned值4,294,967,284传递给printf以使用%d进行格式化时看到“ -12”的原因。 When the bits 11111111111111111111111111110100 are interpreted as an unsigned , they represent the value 4,294,967,284. 当位11111111111111111111111111111100被解释为unsigned ,它们表示值4,294,967,284。 When they are interpreted as an int , they represent the value −12 on your system. 当将它们解释为int ,它们表示系统上的值-12。 (This encoding system is called two's complement. Other encoding systems include one's complement and sign-and-magnitude, in which these bits would represent −1 and −2,147,483,636, respectively. Those systems are rare for plain integer types these days.) (此编码系统称为二进制补码。其他编码系统包括二进制补码和正负号,其中这些位分别代表-1和-2,147,483,636。这些系统如今在纯整数类型中很少见。)

That is the first of two things that can happen, and it is common when you pass the wrong type but it is similar to the correct type in size and nature—it is passed in the same place as the wrong type. 那是可能发生的两件事中的第一件事,当您传递错误的类型时很常见,但是它在大小和性质上都与正确的类型相似-它在错误的位置处传递。 The second thing that can happen is that the argument you pass is passed in a different place than the argument that is expected. 可能发生的第二件事是,您传递的参数在与期望的参数不同的地方传递。 For example, if you pass a double as an argument, it is, in many systems, placed in separate set of registers for floating-point values. 例如,如果将double用作参数,则在许多系统中,将其放在单独的一组浮点值寄存器中。 When printf goes looking for an int argument for %d , it will not find the bits of your double at all. printf寻找%dint参数时,它将根本找不到double的位。 Instead, what it finds in the place where it looks for an int argument might be whatever bits happened to be left in a register or memory location from previous operations, or it might be the bits of the next argument in the list of arguments. 取而代之的是,它在查找int参数的位置发现的内容可能是前一次操作在寄存器或存储器位置中剩下的任何位,或者可能是参数列表中下一个参数的位。 In any case, this means that the value printf prints for the %d will have nothing to do with the double value you passed, because the bits of the double are not involved in any way—a complete different set of bits is used. 无论如何,这意味着%d printf打印值将与您传递的double值无关,因为double的位没有任何关系-使用了完全不同的位集。

This is also part of the reason the C standard says it does not define the behavior when the wrong argument type is passed for a printf conversion. 这也是C标准表示在为printf转换传递错误的参数类型时未定义行为的部分原因。 Once you have messed up the argument list by passing double where an int should have been, all the following arguments may be in the wrong places too. 一旦通过将int应当放在的double弄乱了参数列表,那么以下所有参数也可能位于错误的位置。 They might be in different registers from where they are expected, or they might be in different stack locations from where they are expected. 它们可能与期望值位于不同的寄存器中,或者可能与期望值位于不同的堆栈位置。 printf has no way to recover from this mistake. printf无法从此错误中恢复。

As stated, all of the above neglects compiler optimization. 如上所述,以上所有内容都忽略了编译器优化。 The rules of C arose out of various needs, such as accommodating the problems above and making C portable to a variety of systems. C的规则来自于各种需求,例如适应上述问题并使C可移植到各种系统中。 However, once those rules are written, compilers can take advantage of them to allow optimization. 但是,一旦编写了这些规则,编译器便可以利用它们进行优化。 The C standard permits a compiler to make any transformation of a program as long as the changed program has the same behavior as the original program under the rules of the C standard. 只要更改后的程序在C标准的规则下具有与原始程序相同的行为,C标准就允许编译器对程序进行任何转换 This permission allows compilers to speed up programs tremendously in some circumstances. 此权限使编译器在某些情况下可以极大地加速程序。 But a consequence is that, if your program has behavior not defined by the C standard (and not defined by any other rules the compiler follows), it is allowed to transform your program into anything . 但是结果是,如果您的程序具有C标准未定义的行为(并且编译器遵循的任何其他规则也未定义),则可以将程序转换为任何东西 Over the years, compilers have grown increasingly aggressive about their optimizations, and they continue to grow. 多年来,编译器在优化方面变得越来越积极,并且还在继续增长。 This means, aside from the simple behaviors described above, when you pass incorrect arguments to printf , the compiler is allowed to produce completely different results. 这意味着,除了上述简单的行为之外,当您将不正确的参数传递给printf ,允许编译器产生完全不同的结果。 Therefore, although you may commonly see the behaviors I describe above, you may not rely on them. 因此,尽管您通常可以看到我上面描述的行为,但是您可能并不依赖它们。

Footnote 脚注

1 Note that this is not a conversion . 1请注意,这不是转换 A conversion is an operation whose input is one type and whose output is another type but has the same value (or as nearly the same as is possible, in some sense, as when we convert a double 3.5 to an int 3). 转换是一种操作,其输入是一种类型,而输出是另一种类型,但具有相同的值(或在某种意义上,几乎等于可能,就像我们将double 3.5转换为int 3时一样)。 In some cases, a conversion does not require any change to the bits—an unsigned 3 and an int 3 use the same bits to represent 3, so the conversion does not change the bits, and the result is the same as a reinterpretation. 在某些情况下,转换不需要对位进行任何更改- unsigned 3和int 3使用相同的位表示3,因此该转换不会更改位,其结果与重新解释相同。 But they are conceptually different. 但是它们在概念上是不同的。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM