简体   繁体   English

通过指针访问C联合成员

[英]Accessing C union members via pointers

Does accessing union members via a pointer, as in the example below, result in undefined behavior in C99? 通过指针访问union成员(如下例所示)会导致C99中的未定义行为吗? The intent seems clear enough, but I know that there are some restrictions regarding aliasing and unions. 意图似乎很清楚,但我知道有关于别名和联盟的一些限制。

union { int i; char c; } u;

int  *ip = &u.i;
char *ic = &u.c;

*ip = 0;
*ic = 'a';
printf("%c\n", u.c);

It is unspecified (subtly different from undefined) behaviour to access a union by any element other than the one that was last written. 它是未指定的 (略微不同于undefined)行为,以通过除了上次写入的元素之外的任何元素来访问union。 That's detailed in C99 annex J: 这在C99附件J中有详细说明:

The following are unspecified: 以下是未指定的:
:
The value of a union member other than the last one stored into (6.2.6.1). 联合成员的值不是存储在(6.2.6.1)中的最后一个成员。

However, since you are writing to c via the pointer, then reading c , this particular example is well defined. 但是,由于您通过指针写入c ,然后读取c ,这个特定的示例已经很好地定义了。 It does not matter how you write to the element: 如何写元素并不重要:

u.c = 'a';        // direct write.
*(&(u.c)) = 'a';  // variation on yours, writing through element pointer.
(&u)->c = 'a';    // writing through structure pointer.

There is one issue that has been raised in comments which seems to contradict that, at least seemingly. 在评论中提出的一个问题似乎与此相矛盾,至少看似这样。 User davmac provides sample code: 用户davmac提供示例代码:

// Compile with "-O3 -std=c99" eg:
//  clang -O3 -std=c99 test.c
//  gcc -O3 -std=c99 test.c
// On clang v3.5.1, output is "123"
// On gcc 4.8.4, output is "1073741824"
//
// Different outputs, so either:
// * program invokes undefined behaviour; both compilers are correct OR
// * compiler vendors interpret standard differently OR
// * one compiler or the other has a bug

#include <stdio.h>

union u
{
    int i;
    float f;
};

int someFunc(union u * up, float *fp)
{
    up->i = 123;
    *fp = 2.0;     // does this set the union member?
    return up->i;  // then this should not return 123!
}

int main(int argc, char **argv)
{
    union u uobj;
    printf("%d\n", someFunc(&uobj, &uobj.f));
    return 0;
}

which outputs different values on different compilers. 它在不同的编译器上输出不同的值。 However, I believe that this is because it is actually violating the rules here because it writes to member f then reads member i and, as shown in Annex J, that's unspecified. 但是,我认为这是因为它实际上违反了规则,因为它写入成员f然后读取成员i ,如附件J所示,这是未指定的。

There is a footnote 82 in 6.5.2.3 which states: 在脚注82 6.5.2.3的规定:

If the member used to access the contents of a union object is not the same as the member last used to store a value in the object, the appropriate part of the object representation of the value is reinterpreted as an object representation in the new type. 如果用于访问union对象的内容的成员与上次用于在对象中存储值的成员不同,则该值的对象表示的适当部分将重新解释为新类型中的对象表示。

However, since this seems to go against the Annex J comment and it's a footnote to the section dealing with expressions of the form xy , it may not apply to accesses via a pointer. 但是,由于这似乎违反了附件J的注释,并且它是处理xy形式表达式的部分的脚注,它可能不适用于通过指针访问。

One of the major reasons why aliasing is supposed to be strict is to allow the compiler more scope for optimisation. 别名被认为是严格的主要原因之一是允许编译器有更多的优化空间。 To that end, the standard dictates that treating memory of a different type to that written is unspecified. 为此,该标准要求将未写入的内存与未写入的内容相对应。

By way of example, consider the function provided: 举例来说,考虑提供的功能:

int someFunc(union u * up, float *fp)
{
    up->i = 123;
    *fp = 2.0;     // does this set the union member?
    return up->i;  // then this should not return 123!
}

The implementation is free to assume that, because you're not supposed to alias memory, up->i and *fp are two distinct objects. 实现可以自由地假设,因为你不应该使用别名内存, up->i*fp是两个不同的对象。 So it's free to assume that you're not changing the value of up->i after you set it to 123 so it can simply return 123 without looking at the actual variable contents again. 因此,可以自由地假设在将其设置为123之后不会更改up->i的值,因此它可以简单地返回123而无需再次查看实际的变量内容。

If instead, you changed the pointer setting statement to: 相反,您将指针设置语句更改为:

up->f = 2.0;

then that would make footnote 82 applicable and the returned value would be a re-interpretation of the float as an integer. 然后,这将使脚注82适用,并且返回的值将是浮点的重新解释为整数。

The reason why I don't think that's an issue for the question is because your writing then reading the same type, hence aliasing rules don't come into play. 我不认为这个问题的问题是因为你的写作然后读取相同的类型,因此别名规则不起作用。


It's interesting to note that the unspecified behaviour is caused not by the function itself, but by calling it thus: 值得注意的是,未指定的行为不是由函数本身引起的而是由它调用它:

union u up;
int x = someFunc (&u, &(up.f)); // <- aliasing here

If you were instead to call it so: 如果你是这样称呼它:

union u up;
float down;
int x = someFunc (&u, &down); // <- no aliasing

that would not be a problem. 不是问题。

No, it won't but you need to keep track of what the last type you put into the union was. 不,它不会,但你需要跟踪你输入联盟的最后一种类型。 If I were to reverse the order of your int and char assignments it would be a very different story: 如果我要颠倒你的intchar赋值的顺序,那将是一个非常不同的故事:

#include <stdio.h>

union { int i; char c; } u;

int main()
{
    int  *ip = &u.i;
    char *ic = &u.c;

    *ic = 'a';
    *ip = 123456;

    printf("%c\n", u.c); /* trying to print a char even though 
                            it's currently storing an int,
                            in this case it prints '@' on my machine */

    return 0;
}

EDIT: Some explanation on why it may have printed 64 ('@'). 编辑:为什么它可能打印64('@')的一些解释。

The binary representation of 123456 is 0001 1110 0010 0100 0000. 123456的二进制表示是0001 1110 0010 0100 0000。

For 64 it is 0100 0000. 对于64,它是0100 0000。

You can see that the first 8 bits are identical and since printf is instructed to read the first 8 bits, it prints only as much. 你可以看到前8位是相同的,因为printf被指示读取前8位,它只打印那么多。

The only reason it's not UB is because you were lucky/unlucky enough to choose char for one of the types, and character types can alias anything in C. If the types were, for example, int and float , the accesses via pointers would be aliasing violations and thus undefined behavior. 它不是UB的唯一原因是因为你很幸运/不幸的是为其中一种类型选择了char ,而字符类型可以在C中替换任何内容。如果类型是例如intfloat ,则通过指针访问将是别名违规和未定义的行为。 For direct access via the union, the behavior was deemed well defined as part of the interpretation for Defect Report 283: 为了通过工会直接访问,该行为被认为是对缺陷报告283的解释的一部分:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm http://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_283.htm

Of course, you still need to ensure that the representation of the type used for writing can also be interpreted as a valid (non-trap) representation for the type later used for reading. 当然,您仍需要确保用于写入的类型的表示也可以解释为稍后用于读取的类型的有效(非陷阱)表示。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM