简体   繁体   English

C99中的别名,类型修剪,并集,结构和位字段

[英]Aliasing, Type-punning, Unions, Structs and Bit Fields in C99

After receiving the following statement in an answer to this question : 在收到以下对这个问题的回答的陈述后:

...you are trying to overlay value and bits , and stuffing data into one alternative of an union and taking it out of the other is undefined . ...您正在尝试覆盖valuebits ,并将数据填充到并集的一种替代方法中,然后将其从另一种方法中移除不确定的

I became much more curious as to what is allowed (and what is prudent) in regards to type punning in C99. 我对在C99中允许进行类型修饰的内容(以及什么是审慎的内容)更加好奇。 After taking a look around I found a lot of helpful information in the post Is type-punning through a union unspecified in C99... . 环顾四周后,我发现在文章《通过C99中未指定的联合体进行类型操作》中有很多有用的信息。

There was a lot to take away from both the comments and the answers posted there. 此处发布的评论和答案都需要很多帮助。 For the purpose of clarity (and as a sanity-check) I wanted to create an example based on my understanding of the C99 standard. 为了清楚起见(以及进行完整性检查),我想基于对C99标准的理解来创建一个示例。 Below is the example code that I created and, while it functioned as I anticipated, I wanted to be certain that my assertions are correct. 下面是我创建的示例代码,尽管它按预期运行,但我想确定我的断言是正确的。

The following code contains my assertions in the comments. 以下代码在注释中包含了我的断言。 This is my understanding of type-punning in C99. 这是我对C99中的类型修饰的理解。 Are my comments correct? 我的评论正确吗? If not, can you please explain why? 如果没有,请您解释一下原因?

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

#define NUM_BYTES   sizeof(uint32_t)
typedef union
{
    uint32_t fourByteValue;
    uint8_t  byteValue[NUM_BYTES];
    struct
    {
        unsigned int firstBitSpecified  :   1;
        unsigned int secondBitSpecified :   1;
        unsigned int thirdBitSpecified  :   1;
        unsigned int fourthBitSpecified :   1;
        unsigned int paddingBits        :   4;
        uint8_t  oneByteStructValue;
        uint16_t twoByteStructValue;
    };
} U;

int main (void)
{
    const char border[] = "==============================\n";
    U myUnion;
    uint8_t byte;
    uint32_t fourBytes;
    uint8_t i;

    myUnion.fourByteValue = 0x00FFFFFF;
    fourBytes = myUnion.fourByteValue;  /* 1. This is not type-punning. */
    printf("No type-punning fourByteValue:\n%s"
           "fourBytes\t= 0x%.4x\n\n", border, fourBytes);


    printf("Type-punning byteValue:\n%s", border);
    for (i = 0; i < NUM_BYTES; i++)
    {
        byte = myUnion.byteValue[i];   /* 2. Type-punning allowed by C99, 
                                             no unspecified values. */
        printf ("byte[%d]\t\t= 0x%.2x\n", i, byte);
    }
    printf("\n");

    myUnion.byteValue[3] = 0xff;
    fourBytes = myUnion.fourByteValue; /* 3. Type-punning allowed by C99 
                                             but all other 'byteValue's
                                             are now unspecified values. */
    printf("Type-punning fourByteValue:\n%s"
           "fourBytes\t= 0x%.4x\n\n", border, fourBytes);

    myUnion.firstBitSpecified = 0;
    myUnion.thirdBitSpecified = 0;
    fourBytes = myUnion.fourByteValue; /* 4. Again, this would be allowed, but 
                                             the bit that was just assigned
                                             a value of 0 is implementation
                                             defined AND all other bits are
                                             unspecified values. */
    printf("Type-punning firstBitSpecified:\n%s"
           "fourBytes\t= 0x%.4x\n\n", border, fourBytes);

    myUnion.fourByteValue = 0x00000001;
    fourBytes = myUnion.firstBitSpecified; /* 5. Type-punning allowed, although
                                                 which bit you get is implementation
                                                 specific. */
    printf("No type-punning, firstBitSpecified:\n%s"
           "fourBytes\t= 0x%.4x\n\n", border, fourBytes);
    fourBytes = myUnion.secondBitSpecified;
    printf("No type-punning, secondBitSpecified:\n%s"
           "fourBytes\t= 0x%.4x\n\n", border, fourBytes);

    return (EXIT_SUCCESS);
}

The above code was compiled with mingw32-gcc.exe -Wall -g -std=c99 on a 64 bit Windows 7 machine. 上面的代码是在64位Windows 7计算机上使用mingw32-gcc.exe -Wall -g -std=c99编译的。 Upon running the code I receive the following output: 运行代码后,我收到以下输出:

No type-punning fourByteValue:
==============================
fourBytes       = 0xffffff

Type-punning byteValue:
==============================
byte[0]         = 0xff
byte[1]         = 0xff
byte[2]         = 0xff
byte[3]         = 0x00

Type-punning fourByteValue:
==============================
fourBytes       = 0xffffffff

Type-punning firstBitSpecified:
==============================
fourBytes       = 0xfffffffa

No type-punning, firstBitSpecified:
==============================
fourBytes       = 0x0001

No type-punning, secondBitSpecified:
==============================
fourBytes       = 0x0000

My reading of the footnote linked in that post is that type-punning through a union is never specified. 我对那篇文章中链接的脚注的阅读是, 从未指定通过联合进行类型拼写。 Going from this , the standard says: 从去 ,标准说:

With one exception, if a member of a union object is accessed after a value has been stored in a different member of the object, the behavior is implementation-defined. 除了一个例外,如果在将值存储到对象的其他成员之后访问并集对象的成员,则该行为是实现定义的。

The footnote doesn't change that. 脚注不会改变这一点。 The reason that this is the case is that C makes no guarantees about either (a) the byte order of numeric types, or (b) the ordering in memory of members of a struct , except insofar as the first member must be byte-aligned to the "beginning" of the struct (so that you can do the sort of casting they do in GTK to achieve polymorphism). 出现这种情况的原因是,C不能保证(a)数字类型的字节顺序或(b) struct成员的内存顺序,除非第一个成员必须是字节对齐的到了“开始” struct (这样你可以做的排序铸造,他们在做GTK实现多态)。

The footnote in question addresses this line: 有问题的脚注针对这一行:

When a value is stored in a member of an object of union type, the bytes of the object representation that do not correspond to that member but do correspond to other members take unspecified values, but the value of the union object shall not thereby become a trap representation . 当值存储在联合类型的对象的成员中时,与该成员不对应但与其他成员相对应的对象表示形式的字节采用未指定的值,但联合对象的值不应因此成为陷阱表示

and it says this: 它说:

78a If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type as described in 6.2.6 (a process sometimes called "type punning"). 78a如果用于访问并集对象内容的成员与上次用于在对象中存储值的成员不同,则该值的对象表示的适当部分将重新解释为新类型的对象表示如6.2.6(有时称为“类型校正”的过程)所述。 This might be a trap representation. 这可能是陷阱表示。

The "reinterpretation as an object representation in the new type" is implementation defined (because the interpretation of all types, on a byte-by-byte level, is always implementation defined, taking into account things like endianness, etc). “重新解释为新类型中的对象表示形式”是由实现定义的(因为在逐字节级别上对所有类型的解释始终是实现的定义,同时考虑了字节序等)。 The footnote just adds more detail to point out that extra-surprising things might happen when you mess with the type system via unions, including causing a trap representation. 脚注只是增加了更多细节,以指出当您通过联合对类型系统进行弄乱时,可能会发生令人惊讶的事情,包括引起陷阱表示。 Looking here for a definition of "trap representation: 在这里寻找“陷阱表示形式”的定义:

A trap representation is a set of bits which, when interpreted as a value of a specific type, causes undefined behavior. 陷阱表示是一组位,当将其解释为特定类型的值时,将导致未定义的行为。 Trap representations are most commonly seen on floating point and pointer values, but in theory, almost any type could have trap representations. 陷阱表示形式最常见于浮点数和指针值,但从理论上讲,几乎任何类型都可以使用陷阱表示形式。 An uninitialized object might hold a trap representation. 未初始化的对象可能包含陷阱表示。 This gives the same behavior as the old rule: access to uninitialized objects produces undefined behavior. 这具有与旧规则相同的行为:对未初始化对象的访问将产生未定义行为。

The only guarantees the standard gives about accessing uninitialized data are that the unsigned char type has no trap representations, and that padding has no trap representations. 该标准提供的有关访问未初始化数据的唯一保证是, 无符号字符类型没有陷阱表示,并且填充没有陷阱表示。

So, by replacing uint_8 with unsigned char in your post, you can avoid undefined behavior, and end up with implementation-specific behavior. 因此,通过在帖子中用unsigned char替换uint_8 ,可以避免未定义的行为,并最终实现特定于行为。 As written now, however, UB is not forbidden by the standard. 但是,如现在所写,该标准并不禁止UB。

This is made explicit in a quote in the post you linked: 在您链接的帖子中的引号中明确指出了这一点:

Finally, one of the changes from C90 to C99 was to remove any restriction on accessing one member of a union when the last store was to a different one. 最后,从C90到C99的更改之一是,消除了在最后一个存储库移到另一个存储库时访问工会的一个成员的任何限制。 The rationale was that the behaviour would then depend on the representations of the values. 理由是行为将取决于值的表示形式。

Underlying representations, are, by definition, never defined by the standard. 根据定义,底层表示从未由标准定义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM