简体   繁体   中英

Reading struct/union members from a character buffer

I need to process data that is given to me as a char buffer where the actual structure of the data depends on the values of some of its fields.

More specifically, consider the following header file:

struct IncomingMsgStruct
{
    MsgHdrStruct msgHdr;
    char         msgData[MSG_DATA_MAX_SIZE]; // Can hold any of several structures
};

struct RelevantMessageData
{
    DateTimeStruct   dateTime;
    CommonDataStruct commonData;
    MsgBodyUnion     msgBody;
};

struct DateTimeStruct { /* ... */ };

struct CommonDataStruct
{
    char        name[NAME_MAX_SIZE + 1];
    MsgTypeEnum msgType;
    // more elements here
};

union MsgBodyUnion
{
    MsgBodyType1Struct  msgBodyType1;
    MsgBodyType2Struct  msgBodyType2;
    // ...
    MsgBodyTypeNStruct  msgBodyTypeN;
};

struct MsgBodyType1Struct  { /* ... */ };
struct MsgBodyType2Struct  { /* ... */ };
// ...
struct MsgBodyTypeNStruct  { /* ... */ };

The structures contain data members (some of which are also structures) and member functions for initialization, conversion to string, etc. There are no constructors, destructors, virtual functions, or inheritance.

Please note that this is in the context of a legacy code that I have no control over. The header and the definitions in it are used by other components, and some of them can change with time.

The data is made available to me as a buffer of characters, so my processing function will look like:

ResultType processRelevantMessage(char const* inBuffer);

It is guaranteed that inBuffer contains a MsgStruct structure, and that its msgData member holds a RelevantMessageData structure. Correct alignment and endianness are also guaranteed as the data originated from the corresponding structures on the same platform.

For simplicity, let's assume that I am only interested in the case where msgType equals to a specific value, so only the members of, say MsgBodyType2Struct , will need to be accessed (and an error returned otherwise). I can generalize it to handle several types later.

My understanding is that a naive implementation using reinterpret_cast can run afoul of the C++ strict aliasing rules.

My question is:

How can I do it in standard-compliant C++ without invoking undefined behaviour, without changing or duplicating the definitions, and without extra copying or allocations?

Or, if that is not possible, how can I do it in GCC (possibly using flags such as -fno-strict-aliasing etc.)?

My understanding is that a naive implementation using reinterpret_cast can run afoul of the C++ strict aliasing rules.

Indeed. Also, consider that an array of bytes might start at an arbitrary address in memory, whereas a struct typically has some alignment restrictions that need to be satisfied. The safest way to deal with this is to create a new object of the desired type, and use std::memcpy() to copy the bytes from the buffer into the object:

ResultType processRelevantMessage(char const* inBuffer) {
    MsgHdrStruct hdr;
    std::memcpy(&hdr, inbuffer, sizeof hdr);
    ...
    RelevantStruct data;
    std::memcpy(&data, inbuffer + sizeof hdr, sizeof data);
    ...
}

The above is well-defined C++ code, you can use hdr and data afterwards without problems (as long as those are POD types that don't contain any pointers).

I suggest using a serialization library or write operator<< and operator>> overloads for those struct s. You could use the functions htonl and ntohl which are available on some platforms or write a support class to stream numeric values yourself.

Such a class could look like this:

#include <bit>
#include <algorithm>

#include <cstring>
#include <iostream>
#include <iterator>
#include <limits>
#include <type_traits>

template<class T>
struct tfnet { // to/from net (or file)
    static_assert(std::endian::native == std::endian::little ||
                  std::endian::native == std::endian::big); // endianess must be known
    static_assert(std::numeric_limits<double>::is_iec559);  // only support IEEE754
    static_assert(std::is_arithmetic_v<T>);                 // only for arithmetic types 

    tfnet(T& v) : val(&v) {} // store a pointer to the value to be streamed

    // write a value to a stream
    friend std::ostream& operator<<(std::ostream& os, const tfnet& n) {
        if constexpr(std::endian::native == std::endian::little) {
            // reverse byte order to be in network byte order
            char buf[sizeof(T)];
            std::memcpy(buf, n.val, sizeof buf);
            std::reverse(std::begin(buf), std::end(buf));
            os.write(buf, sizeof buf);
        } else {
            // already in network byte order
            os.write(n.val, sizeof(T));
        }
        return os;
    }

    // read a value from a stream
    friend std::istream& operator>>(std::istream& is, const tfnet& n) {
        char buf[sizeof(T)];
        if(is.read(buf, sizeof buf)) {
            if constexpr(std::endian::native == std::endian::little) {
                // reverse byte order to be in network byte order
                std::reverse(std::begin(buf), std::end(buf));
            }
            std::memcpy(n.val, buf, sizeof buf);
        }
        return is;
    }
    T* val;
};

Now, if you have a set of struct s:

#include <cstdint>

struct data {
    std::uint16_t x = 10;
    std::uint32_t y = 20;
    std::uint64_t z = 30;
};

struct compound {
    data x;
    int y = 40;
};

You can add the streaming operators for them:

std::ostream& operator<<(std::ostream& os, const data& d) {
    return os << tfnet{d.x} << tfnet{d.y} << tfnet{d.z};
}
std::istream& operator>>(std::istream& is, data& d) {
    return is >> tfnet{d.x} >> tfnet{d.y} >> tfnet{d.z};
}

std::ostream& operator<<(std::ostream& os, const compound& d) {
    return os << d.x << tfnet{d.y}; // using data's operator<< for d.x
}
std::istream& operator>>(std::istream& is, compound& d) {
    return is >> d.x >> tfnet{d.y}; // using data's operator>> for d.x
}

And reading/writing the struct s:

#include <sstream>

int main() {
    std::stringstream ss;
    
    compound x;
    compound y{{0,0,0},0};

    ss << x; // write to stream
    ss >> y; // read from stream
}

Demo

If you can't use the streaming operators directly on the source streams, you can put the char buffer you do get in an istringstream and extract the data from that using the added operators.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM