简体   繁体   中英

C++ Union member not initialized

I was just beginning my quest with unions when I found something weird

If I run this program

    #include <iostream>
    using namespace std;
    union myun{
    public:
    int x;
    char c;
    };

    int main()
    {
     myun y;
     //y.x=65;
      y.c='B';
     cout<<y.x;
    }

The output was some garbage value which doesnt change if change the value of yc Next I did this

    #include <iostream>
    using namespace std;
    union myun{
    public:
    int x;
    char c;

    };
     int main()
     {
      myun y;
      y.x=65;
      y.c='B';
      cout<<y.x;
     }

The output was as expected to be 66 because yc='B' replaces the 65 by its ASCII value(66). Can anyone explain the first case?

It's actually undefined behaviour to read from a union member that wasn't the last one written to.

You can do this if the items within the union are layout-compatible (as defined in the standard) but that's not the case here with an int and char (more correctly, it could be the case if those two types had similar bit widths, but that's not usually the case).

From the C++03 standard (superceded by C++11 now but still relevant):

In a union, at most one of the data members can be active at any time, that is, the value of at most one of the data members can be stored in a union at any time.

I think you may want to look into reinterpret_cast if you want to do this sort of overlaying activity.


In terms of what's actually happening under the covers in the first one, the hex value of the number output:

-1218142398 (signed) -> 3076824898 (unsigned) -> B7649F42 (hex)
                                                       ==
                                                       ^^
                                                       ||
                                       0x42 is 'B' ----++

should provide a clue. The yc='B' is only setting the single byte of that structure, leaving the other three bytes (in my case) as indeterminate.

By putting in the yx=65 line before that point, it's setting all four bytes, with those three spare ones being set to zero. Hence they stay at zero when you set the single byte in the following assignment.

Well, you kinda explained the first case when you showed your understanding of the second case.

Initialising the character part only modifies one byte in a datatype that provides int . Assuming 32-bit int, that means 3 bytes are still uninitialised... Hence the garbage.

Here's the memory usage of your union:

              byte
           0  1  2  3
         +------------
myun::x  | X  X  X  X
myun::c  | X  -  -  -

When you set x , you set an integer, so all remaining bytes are initialised. When you set c , you only modify a single byte.

  y.c='B';
 cout<<y.x;

This has undefined behaviour. At any given time, union contains only one of its members. You cannot try to read the int member if it actually contains the char member. Because the behaviour of this is not defined the compiler is allowed to do what it wants with the code.

Because sizeof(int) != sizeof(char) .

That is to say, an integer and a character take up different amounts of memory (in the average computer these days, int is 4 bytes, char is 1 byte). The union is only as large as it's largest member. Thus, when you set the char, you only set 1 byte of memory - the other 3 bytes are just random garbage.

Either set the biggest member of the union first, or do something like:

memset(&y, 0, sizeof(y));

to fill the entire union with zero.

In a union, memory allocated is equal to the size of the largest member,which in your case is int ie 2 bytes in case of 16-bit compiler. All members use the same memory space to store their data, hence practically, only one type of member can be stored at a time.

When you assigned the value 'B' to the char member it stored 66 in its memory space of 1 byte. Then you tried to output the value of the int member which however tried to compute a value by reading values from 2 bytes of the memory, hence you got a garbage value.

Local variables (more specifically variables on the stack, ie having storage class "automatic") of POD type aren't initialised to anything when they are declared, so the 3 bytes (or 7 bytes on a 64-bit system) not affected by your assignment to yc will contain random garbage.

Also note that the particular byte affected by the assignment to yc depends on the endianness of the CPU, so this code will behave differently on different systems even if you initialise yx before assigning to yc .

The variable y is of the union type, and the y's length is four bytes. For instance, y's memory layout is like this:

---------------------------------
| byte1 | byte2 | byte3 | byte4 |
---------------------------------

1) In the first program, the sentence yc='B'; just set byte1, but byte2, byte3, byte4 are random values in the stack.

2) In the second program, the sentence yx=65; set byte1 as 65 , the byte2, byte3, byte4 is zero. Then, the sentence yc='B'; set byte1 as the integer ASCII value of 'B', hence giving an output of 66.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM