简体   繁体   English

如果传递给sscanf的参数被强制转换会发生什么

[英]What happens if arguments passed to sscanf are cast

While reviewing and old piece of code, I stumbled upon some coding horror like this one: 在查看旧代码时,我偶然发现了一些像这样的编码恐怖:

struct Foo
{
    unsigned int  bar;
    unsigned char qux;
    unsigned char xyz;
    unsigned int  etc;
};

void horror(const char* s1, const char* s2, const char* s3, const char* s4, Foo* foo)
{
    sscanf(s1, "%u", &(foo->bar));
    sscanf(s2, "%u", (unsigned int*) &(foo->qux));
    sscanf(s3, "%u", (unsigned int*) &(foo->xyz));
    sscanf(s4, "%u", &(foo->etc));
}

So, what is actually happening in the second and third sscanf , with the argument passed being a unsigned char* cast to unsigned int* , but with the format specifier for an unsigned integer? 那么,第二个和第三个sscanf实际发生了什么,传递的参数是unsigned char*强制转换为unsigned int* ,但是带有无符号整数的格式说明符? Whatever happens is due to undefined behavior, but why is this even "working"? 无论发生什么是由于未定义的行为,但为什么这甚至“工作”?

As far as I know, the cast effectively does nothing in this case (the actual type of the arguments passed as ... is unknown to the called function). 据我所知,在这种情况下,强制转换实际上没有做任何事情(作为...传递的参数的实际类型对于被调用的函数是未知的)。 However this has been in production for years and it has never crashed and the surrounding values apparently are not overwritten, I suppose because the members of the structure are all aligned to 32 bits. 然而,这已经生产了多年,它从未崩溃,周围的值显然没有被覆盖,我想因为结构的成员都对齐到32位。 This is even reading the correct value on the target machine (a little endian 32 bit ARM) but I think that it would no longer work on a different endianness. 这甚至可以在目标机器上读取正确的值(一个小端32位ARM),但我认为它将不再适用于不同的字节序。

Bonus question: what is the cleanest correct way to do this? 奖金问题:最干净的正确方法是什么? I know that now we have the %hhu format specifier (apparently introduced by C++11), but what about a legacy C89 compiler? 我知道现在我们有%hhu格式说明符(显然是由C ++ 11引入的),但遗留的C89编译器呢?


Please note that the original question had uint32_t instead of unsigned int and unsigned char instead of uint8_t but that was just misleading and out of topic, and by the way the original code I was reviewing uses its own typedefs. 请注意,原始问题有uint32_t而不是unsigned intunsigned char而不是uint8_t但这只是误导和超出主题,顺便说一下,我正在审查的原始代码使用自己的typedef。

In this case from the pointer point of view nothing as on the all modern machines pointers are the same for all types. 在这种情况下,从指针的角度来看,所有现代机器上的指针对于所有类型都是相同的。

But because you use wrong formats - the scanf will write outside the memory allocated to the variables and it is an Undefined Behaviour. 但是因为你使用了错误的格式 - scanf会在分配给变量的内存之外写入,而且它是一个未定义的行为。

Bonus question: what is the cleanest correct way to do this? 奖金问题:最干净的正确方法是什么? I know that now we have the %hhu format specifier (apparently introduced by C++11), but what about a legacy C89 compiler? 我知道现在我们有%hhu格式说明符(显然是由C ++ 11引入的),但遗留的C89编译器呢?

The <stdint.h> header and its types were introduced in C99, so a C89 compiler won't support them except as an extension. <stdint.h>头及其类型是在C99中引入的,因此C89编译器除了作为扩展名外不支持它们。

The correct way to use the *scanf() and *printf() families of functions with the various fixed or minimum-width types is to use the macros from <inttypes.h> . 使用具有各种固定或最小宽度类型的函数的*scanf()*printf()系列的正确方法是使用<inttypes.h>的宏。 For example: 例如:

#include <inttypes.h>
#include <stdlib.h>
#include <stdio.h>

int main(void) {
  int8_t foo;
  uint_least16_t bar;

  puts("Enter two numbers");
  if (scanf("%" SCNd8 " %" SCNuLEAST16, &foo, &bar) != 2) {
    fputs("Input failed!\n", stderr);
    return EXIT_FAILURE;
  }
  printf("You entered %" PRId8 " and %" PRIuLEAST16 "\n", foo, bar);
}

First of all, this of course invokes Undefined Behaviour. 首先,这当然会调用Undefined Behavior。

But that kind of horror was quite common in old code, where the C language was used as a higher level assembly language. 但是这种恐怖在旧代码中很常见,其中C语言被用作更高级别的汇编语言。 So here are 2 possible behaviours: 所以这里有两种可能的行为:

  • the structure has a 32 bits alignment. 该结构具有32位对齐。 All is (rather fine) on a little endian machine: the uint8_t members will recieve the least significant byte of the 32 bits value and the padding bytes will be zeroed (I assume that the program does not try to store a value greater than 255 into an uint8_t ) 在一个小端机器上都是(相当精细): uint8_t成员将接收32位值的最低有效字节,并且填充字节将被置零(我假设程序不会尝试将大于255的值存储到一个uint8_t
  • the structure has not a 32 bits alignement, but the architecture allows scanf to write into mis-aligned variables. 该结构没有32位对齐,但该架构允许scanf写入错误对齐的变量。 The least significant byte of the value read for qux will correctly go into qux and the next 3 zero bytes will erase xyz and etc . qux读取的值的最低有效字节将正确进入qux ,接下来的3个零字节将擦除xyz etc On next line, xyz receives its value and etc recieves one more 0 byte. 在下一行, xyz接收其值, etc接收0字节。 And finally etc will recieve its value. 最后etc会收到它的价值。 This could have been a rather common hack in the early 80' on an 8086 type machine. 这可能是80年代早期8086型机器上相当常见的黑客攻击。

For a portable way, I would use an temporary unsigned integer: 对于可移植的方式,我会使用临时无符号整数:

uint32_t u;
sscanf(s1, "%u", &(foo->bar));
sscanf(s2, "%u", &u);
foo->qux = (uint8_t) u;
sscanf(s3, "%u", &u);
foo->xyz = (uint8_t) u;
sscanf(s4, "%u", &(foo->etc));

and trust the compiler to generate code as efficient as the horror way. 并信任编译器生成与恐怖方式一样高效的代码。

OP code is UB as scan specifiers does not match arguments. OP代码是UB,因为扫描说明符与参数不匹配。

cleanest correct way to do this? 最干净的正确方法吗?

Cleaner 清洁器

#include <inttypes.h>

void horror1(const char* s1, const char* s2, const char* s3, const char* s4, Foo* foo) {
    sscanf(s1, "%" SCNu32, &(foo->bar));
    sscanf(s2, "%" SCNu8, &(foo->qux));
    sscanf(s2, "%" SCNu8, &(foo->xyz));
    sscanf(s1, "%" SCNu32, &(foo->etc));
}

Cleanest 干净

Add additional error handling if desired. 如果需要,添加其他错误处理。

void horror2(const char* s1, const char* s2, const char* s3, const char* s4, Foo* foo) {
    foo->bar = (uint32_t) strtoul(s1, 0, 10);
    foo->qux = (uint8_t) strtoul(s1, 0, 10);
    foo->xyz = (uint8_t) strtoul(s1, 0, 10);
    foo->etc = (uint32_t) strtoul(s1, 0, 10);
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM