简体   繁体   中英

C++ understanding Unions and Structs

I've come to work on an ongoing project where some unions are defined as follows:

/* header.h */
typedef union my_union_t {
  float data[4];
  struct {
    float varA;
    float varB;
    float varC;
    float varD;
  };
} my_union;

If I understand well, unions are for saving space, so sizeof(my_union_t) = MAX of the variables in it. What are the advantages of using the statement above instead of this one:

typedef struct my_struct {
  float varA;
  float varB;
  float varC;
  float varD;
};

Won't be the space allocated for both of them the same?

And how can I initialize varA,varB... from my_union ?

Union are not mostly for saving space, but to implement sum types (for that, you'll put the union in some struct or class having also a discriminating field which would keep the run-time tag). Also, I suggest you to use a recent standard of C++, at least C++11 since it has better support of unions (eg permits more easily union of objects and their construction or initialization).

The advantage of using your union is to be able to index the n -th floating point (with 0 <= n <= 3) as u.data[n]

To assign a union field in some variable declared my_union u; just code eg u.varB = 3.14; which in your case has the same effect as u.data[1] = 3.14;


A good example of well deserved union is a mutable object which can hold either an int or a string (you could not use derived classes in that case):

class IntOrString {
   bool isint;
   union {
      int num; // when isint is true
      str::string str; // when isint is false
   };
 public:
   IntOrString(int n=0) : isint(true), num(n) {};
   IntOrString(std::string s) : isint(false), str(s) {};
   IntOrString(const IntOrString& o): isint(o.isint) 
      { if (isint) num = o.num; else str = o.str); };
   IntOrString(IntOrString&&p) : isint(p.isint) 
      { if (isint) num = std::move (p.num); 
        else str = std::move (p.str); };
   ~IntOrString() { if (isint) num=0; else str->~std::string(); }; 
   void set (int n) 
     { if (!isint) str->~std::string(); isint=true; num=n; };
   void set (std::string s) { str = s; isint=false; };
   bool is_int() const { return isint; };
   int as_int() const { return (isint?num:0; };
   const std::string as_string() const { return (isint?"":str;};
 }; 

Notice the explicit calls of destructor of str field. Notice also that you can safely use IntOrString in a standard container ( std::vector<IntOrString> )

See also std::optional in future versions of C++ (which conceptually is a tagged union with void )

BTW, in Ocaml, you simply code:

 type intorstring = Integer of int | String of string;;

and you'll use pattern matching . If you wanted to make that mutable, you'll need to make a record or a reference of it.

You'll better use union -s in a C++ idiomatic way (see this for general advices).

Unions are often used when implementing a variant like object (a type field and a union of data types), or in implementing serialisation.

The way you are using a union is a recipe for disaster.

You are assuming the the struct in the union is packing the float s with no gaps between then! The standard guarantees that float data[4]; is contiguous, but not the structure elements. The only other thing you know is that the address of varA ; is the same as the address of data[0] .

Never use a union in this way.

As for your question: "And how can I initialize varA,varB... from my_union?". The answer is, access the structure members in the normal long-winded way not via the data[] array.

The advantage is that with a union you can access the same memory in two different ways.

In your example the union contains four floats. You can access those floats as varA, varB... which might be more descriptive names or you can access the same variables as an array data[0], data[1]... which might be more useful in loops.

With a union you can also use the same memory for different kinds of data, you might find that useful for things like writing a function to tell you if you are on a big endian or little endian CPU.

No, it is not for saving space. It is for ability to represent some binary data as various data types. for example

#include <iostream>
#include <stdint.h>

union Foo{
   int x;
   struct y
   {
      unsigned char b0, b1, b2, b3;
   };
   char z[sizeof(int)];
};

int main()
{
   Foo bar;
   bar.x = 100;

   std::cout << std::hex; // to show number in hexadec repr;
   for(size_t i = 0; i < sizeof(int); i++)
   {
      std::cout << "0x" << (int)bar.z[i] << " "; // int is just to show values as numbers, not a characters
   }

   return 0;
}

output: 0x64 0x0 0x0 0x0 The same values are stored in struct bar.y, but not in array but in sturcture members. Its because my machine have a little endiannes . If it were big, than the output would be reversed: 0x0 0x0 0x0 0x64

You can achieve the same using reinterpret_cast :

#include <iostream>
#include <stdint.h>

int main()
{

   int x = 100;
   char * xBytes = reinterpret_cast<char*>(&x);

   std::cout << std::hex; // to show number in hexadec repr;
   for (size_t i = 0; i < sizeof(int); i++)
   {
      std::cout << "0x" << (int)xBytes[i] << " "; // (int) is just to show values as numbers, not a characters
   }

   return 0;
}

its usefull, for example, when you need to read some binary file, that was written on a machine with different endianess than yours. You can just access values as bytearray and swap those bytes as you wish.

Also, it is usefull when you have to deal with bit fields , but its a whole different story :)

I think the best way to understand unions is to just to give 2 common practical examples.

The first example is working with images. Imagine you have and RGB image that is arranged in a long buffer.
What most people would do, is represent the buffer as a char* and then loop it by 3's to get the R,G,B.

What you could do instead, is make a little union, and use that to loop over the image buffer:

union RGB
{
   char raw[3];
   struct 
   {
      char R;
      char G;
      char B;
   } colors;
}

RGB* pixel = buffer[0];
///pixel.colors.R == The red color in the first pixel.

Another very useful use for unions is using registers and bitfields.
Lets say you have a 32 bit value, that represents some HW register, or something.
Sometimes, to save space, you can split the 32 bits into bit fields, but you also want the whole representation of that register as a 32 bit type.
This obviously saves bit shift calculation that a lot of programmers use for no reason at all.

union MySpecialRegister
{
    uint32_t register;
    struct 
    {
       unsigned int firstField           : 5;
       unsigned int somethingInTheMiddle : 25;
       unsigned int lastField            : 6; 
    } data;
}
// Now you can read the raw register into the register field
// then you can read the fields using the inner data struct

Unions are mainly used to represent the same data in different ways. Imagine if you have a user-defined structure containing various data, such as:

typedef struct
{
  int x;
  int y;
  float f;
} my_struct;

Now you want to send this structure to another computer over a serial bus. But your serial bus hardware can only send 1 byte at a time. And on the receiver side, there is a hardware buffer of x bytes. In order to be able to send the data type at all, you can make a union:

typedef union
{ 
  my_struct s;
  uint8_t   bytes[sizeof(my_struct)];
} my_union;

Now you can use the bytes member of the union whenever you want to send and receive, and the s member when you want to access the actual data.

The above concept is used a lot in hardware-related programming: data protocols, hardware register definitions, memory mapping, NVM drivers and so on.

NOTE: whenever you do things like this, be careful with padding and alignment! The compiler is free to insert padding bytes anywhere inside a struct/union to align data. This can break the above concept.

As pointed out in other answers, you can also use unions to implement "sum type"/"variants", but the practical use for such is very limited, if such a use even exists.

First of all: Avoid unions where the access goes to the same memory but to different types!

Unions did not save space at all. The only define multiple names on the same memory area! And you can only store one of the elements in one time in a union.

if you have

union X
    {
        int x;
        char y[4];
     };

you can store an int OR 4 chars but not both! The general problem is, that nobody knows which data is actually stored in a union. If you store a int and read the chars, the compiler will not check that and also there is no runtime check. A solution is often to provide an additional data element in a struct to a union which contains the actual stored data type as an enum.

 struct Y
 {
      enum { IS_CHAR, IS_INT } tinfo;
      union
      {
           int x;
           char y[4];
       };
  }

But in c++ you always should use classes or structs which can derive from a maybe empty parent class like this:

class Base
{
};

class Int_Type: public Base
{
    ...
    int x;
};

class Char_Type: public Base
{
     ...
    char y[4];
};

So you can device pointers to base which actually can hold a Int or a Char Type for you. With virtual functions you can access the members in a object oriented way of programming.

As mentioned already from Basile's answer, a useful case can be the access via different names to the same type .

union X 
{
      struct data
      {
          float a;
          float b;
      };
      float arr[2];
 };

which allows different access ways to the same data with the same type . Using different types which are stored in the same memory should be avoided at all!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM