I need to process data that is given to me as a char buffer where the actual structure of the data depends on the values of some of its fields.
More specifically, consider the following header file:
struct IncomingMsgStruct
{
MsgHdrStruct msgHdr;
char msgData[MSG_DATA_MAX_SIZE]; // Can hold any of several structures
};
struct RelevantMessageData
{
DateTimeStruct dateTime;
CommonDataStruct commonData;
MsgBodyUnion msgBody;
};
struct DateTimeStruct { /* ... */ };
struct CommonDataStruct
{
char name[NAME_MAX_SIZE + 1];
MsgTypeEnum msgType;
// more elements here
};
union MsgBodyUnion
{
MsgBodyType1Struct msgBodyType1;
MsgBodyType2Struct msgBodyType2;
// ...
MsgBodyTypeNStruct msgBodyTypeN;
};
struct MsgBodyType1Struct { /* ... */ };
struct MsgBodyType2Struct { /* ... */ };
// ...
struct MsgBodyTypeNStruct { /* ... */ };
The structures contain data members (some of which are also structures) and member functions for initialization, conversion to string, etc. There are no constructors, destructors, virtual functions, or inheritance.
Please note that this is in the context of a legacy code that I have no control over. The header and the definitions in it are used by other components, and some of them can change with time.
The data is made available to me as a buffer of characters, so my processing function will look like:
ResultType processRelevantMessage(char const* inBuffer);
It is guaranteed that inBuffer
contains a MsgStruct
structure, and that its msgData
member holds a RelevantMessageData
structure. Correct alignment and endianness are also guaranteed as the data originated from the corresponding structures on the same platform.
For simplicity, let's assume that I am only interested in the case where msgType
equals to a specific value, so only the members of, say MsgBodyType2Struct
, will need to be accessed (and an error returned otherwise). I can generalize it to handle several types later.
My understanding is that a naive implementation using reinterpret_cast
can run afoul of the C++ strict aliasing rules.
My question is:
How can I do it in standard-compliant C++ without invoking undefined behaviour, without changing or duplicating the definitions, and without extra copying or allocations?
Or, if that is not possible, how can I do it in GCC (possibly using flags such as -fno-strict-aliasing
etc.)?
My understanding is that a naive implementation using
reinterpret_cast
can run afoul of the C++ strict aliasing rules.
Indeed. Also, consider that an array of bytes might start at an arbitrary address in memory, whereas a struct
typically has some alignment restrictions that need to be satisfied. The safest way to deal with this is to create a new object of the desired type, and use std::memcpy()
to copy the bytes from the buffer into the object:
ResultType processRelevantMessage(char const* inBuffer) {
MsgHdrStruct hdr;
std::memcpy(&hdr, inbuffer, sizeof hdr);
...
RelevantStruct data;
std::memcpy(&data, inbuffer + sizeof hdr, sizeof data);
...
}
The above is well-defined C++ code, you can use hdr
and data
afterwards without problems (as long as those are POD types that don't contain any pointers).
I suggest using a serialization library or write operator<<
and operator>>
overloads for those struct
s. You could use the functions htonl
and ntohl
which are available on some platforms or write a support class to stream numeric values yourself.
Such a class could look like this:
#include <bit>
#include <algorithm>
#include <cstring>
#include <iostream>
#include <iterator>
#include <limits>
#include <type_traits>
template<class T>
struct tfnet { // to/from net (or file)
static_assert(std::endian::native == std::endian::little ||
std::endian::native == std::endian::big); // endianess must be known
static_assert(std::numeric_limits<double>::is_iec559); // only support IEEE754
static_assert(std::is_arithmetic_v<T>); // only for arithmetic types
tfnet(T& v) : val(&v) {} // store a pointer to the value to be streamed
// write a value to a stream
friend std::ostream& operator<<(std::ostream& os, const tfnet& n) {
if constexpr(std::endian::native == std::endian::little) {
// reverse byte order to be in network byte order
char buf[sizeof(T)];
std::memcpy(buf, n.val, sizeof buf);
std::reverse(std::begin(buf), std::end(buf));
os.write(buf, sizeof buf);
} else {
// already in network byte order
os.write(n.val, sizeof(T));
}
return os;
}
// read a value from a stream
friend std::istream& operator>>(std::istream& is, const tfnet& n) {
char buf[sizeof(T)];
if(is.read(buf, sizeof buf)) {
if constexpr(std::endian::native == std::endian::little) {
// reverse byte order to be in network byte order
std::reverse(std::begin(buf), std::end(buf));
}
std::memcpy(n.val, buf, sizeof buf);
}
return is;
}
T* val;
};
Now, if you have a set of struct
s:
#include <cstdint>
struct data {
std::uint16_t x = 10;
std::uint32_t y = 20;
std::uint64_t z = 30;
};
struct compound {
data x;
int y = 40;
};
You can add the streaming operators for them:
std::ostream& operator<<(std::ostream& os, const data& d) {
return os << tfnet{d.x} << tfnet{d.y} << tfnet{d.z};
}
std::istream& operator>>(std::istream& is, data& d) {
return is >> tfnet{d.x} >> tfnet{d.y} >> tfnet{d.z};
}
std::ostream& operator<<(std::ostream& os, const compound& d) {
return os << d.x << tfnet{d.y}; // using data's operator<< for d.x
}
std::istream& operator>>(std::istream& is, compound& d) {
return is >> d.x >> tfnet{d.y}; // using data's operator>> for d.x
}
And reading/writing the struct
s:
#include <sstream>
int main() {
std::stringstream ss;
compound x;
compound y{{0,0,0},0};
ss << x; // write to stream
ss >> y; // read from stream
}
If you can't use the streaming operators directly on the source streams, you can put the char
buffer you do get in an istringstream
and extract the data from that using the added operators.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.