如何解释C ++中的二进制数据？

Question

I am sending and receiving binary data to/from a device in packets (64 byte). 我正在以数据包（64字节）向设备发送和接收二进制数据。 The data has a specific format, parts of which vary with different request / response. 数据具有特定格式，其中一部分因不同的请求/响应而异。

Now I am designing an interpreter for the received data. 现在我正在为收到的数据设计一个解释器。 Simply reading the data by positions is OK, but doesn't look that cool when I have a dozen different response formats. 简单地按位置读取数据是可以的，但是当我有十几种不同的响应格式时看起来并不那么酷。 I am currently thinking about creating a few structs for that purpose, but I don't know how will it go with padding. 我目前正在考虑为此目的创建一些结构，但我不知道如何使用填充。

Maybe there's a better way? 也许有更好的方法？

Related: 有关：

Safe, efficient way to access unaligned data in a network packet from C 从C访问网络数据包中未对齐数据的安全有效方法

Answer 1

You need to use structs and or unions. 你需要使用结构和/或联合。 You'll need to make sure your data is properly packed on both sides of the connection and you may want to translate to and from network byte order on each end if there is any chance that either side of the connection could be running with a different endianess. 您需要确保您的数据在连接的两端都正确打包，并且如果连接的任何一方可能以不同的方式运行，您可能希望在每端的网络字节顺序之间进行转换。字节序。

As an example: 举个例子：

#pragma pack(push)  /* push current alignment to stack */
#pragma pack(1)     /* set alignment to 1 byte boundary */
typedef struct {
    unsigned int    packetID;  // identifies packet in one direction
    unsigned int    data_length;
    char            receipt_flag;  // indicates to ack packet or keep sending packet till acked
    char            data[]; // this is typically ascii string data w/ \n terminated fields but could also be binary
} tPacketBuffer ;
#pragma pack(pop)   /* restore original alignment from stack */

and then when assigning: 然后在分配时：

packetBuffer.packetID = htonl(123456);

and then when receiving: 接收时：

packetBuffer.packetID = ntohl(packetBuffer.packetID);

Here are some discussions of Endianness and Alignment and Structure Packing 以下是有关Endianness和Alignment以及Structure Packing的一些讨论

If you don't pack the structure it'll end up aligned to word boundaries and the internal layout of the structure and it's size will be incorrect. 如果你没有打包结构，它最终将与字边界和结构的内部布局对齐，并且它的大小将是不正确的。

Answer 2

It's hard to say what the best solution is without knowing the exact format(s) of the data. 如果不知道数据的确切格式，很难说最佳解决方案是什么。 Have you considered using unions? 你考虑过使用工会吗？

Answer 3

I've done this innumerable times before: it's a very common scenario. 我以前做过无数次：这是一个非常常见的场景。 There's a number of things which I virtually always do. 我实际上总是做很多事情。

Don't worry too much about making it the most efficient thing available. 不要太担心使它成为最有效的东西。

If we do wind up spending a lot of time packing and unpacking packets, then we can always change it to be more efficient. 如果我们最终花费大量时间来打包和解包数据包，那么我们总是可以将其更改为更高效。 Whilst I've not encountered a case where I've had to as yet, I've not been implementing network routers! 虽然我还没有遇到过我必须要做的情况，但我还没有实现网络路由器！

Whilst using structs/unions is the most efficient approach in term of runtime, it comes with a number of complications: convincing your compiler to pack the structs/unions to match the octet structure of the packets you need, work to avoid alignment and endianness issues, and a lack of safety since there is no or little opportunity to do sanity checks on debug builds. 虽然在运行时使用结构/联合是最有效的方法，但它带来了许多复杂性：说服编译器打包结构/联合以匹配所需数据包的八位字节结构，努力避免对齐和字节顺序问题由于没有机会对调试版本进行健全性检查，因此缺乏安全性。

I often wind up with an architecture including the following kinds of things: 我经常结束包括以下几种事物的架构：

A packet base class. 数据包基类。 Any common data fields are accessible (but not modifiable). 任何公共数据字段都是可访问的（但不可修改）。 If the data isn't stored in a packed format, then there's a virtual function which will produce a packed packet. 如果数据没有以打包格式存储，则会有一个虚拟函数生成打包数据包。
A number of presentation classes for specific packet types, derived from common packet type. 从常见数据包类型派生的特定数据包类型的许多表示类。 If we're using a packing function, then each presentation class must implement it. 如果我们使用打包功能，那么每个表示类都必须实现它。
Anything which can be inferred from the specific type of the presentation class (ie a packet type id from a common data field), is dealt with as part of initialisation and is otherwise unmodifiable. 可以从表示类的特定类型推断的任何内容（即来自公共数据字段的分组类型id）作为初始化的一部分处理，否则是不可修改的。
Each presentation class can be constructed from an unpacked packet, or will gracefully fail if the packet data is invalid for the that type. 每个表示类都可以从解压缩的数据包构造，或者如果数据包数据对于该类型无效则会正常失败。 This can then be wrapped up in a factory for convenience. 然后可以将其包装在工厂中以方便使用。
If we don't have RTTI available, we can get "poor-man's RTTI" using the packet id to determine which specific presentation class an object really is. 如果我们没有RTTI可用，我们可以使用数据包ID来获取“穷人的RTTI”，以确定对象确实是哪个特定的表示类。

In all of this, it's possible (even if just for debug builds) to verify that each field which is modifiable is being set to a sane value. 在所有这些中，可以（即使仅用于调试版本）验证每个可修改的字段是否被设置为合理的值。 Whilst it might seem like a lot of work, it makes it very difficult to have an invalidly formatted packet, a pre-packed packets contents can be easilly checked by eye using a debugger (since it's all in normal platform-native format variables). 虽然它可能看起来很多工作，但它使得格式化的数据包非常困难，可以使用调试器通过眼睛轻松检查预先打包的数据包内容（因为它全部采用普通的平台本机格式变量）。

If we do have to implement a more efficient storage scheme, that too can be wrapped in this abstraction with little additional performance cost. 如果我们确实需要实现更高效的存储方案，那么这也可以包含在这种抽象中而几乎没有额外的性能成本。

Answer 4

I agree with Wuggy. 我同意Wuggy。 You can also use code generation to do this. 您也可以使用代码生成来执行此操作。 Use a simple data-definition file to define all your packet types, then run a python script over it to generate prototype structures and serialiation/unserialization functions for each one. 使用简单的数据定义文件来定义所有数据包类型，然后在其上运行python脚本，为每个数据包生成原型结构和序列化/反序列化函数。

Answer 5

This is an "out-of-the-box" solution, but I'd suggest to take a look at the Python construct library. 这是一个“开箱即用”的解决方案，但我建议你看一下Python 构造库。

Construct is a python library for parsing and building of data structures (binary or textual). Construct是一个用于解析和构建数据结构（二进制或文本）的python库。 It is based on the concept of defining data structures in a declarative manner, rather than procedural code: more complex constructs are composed of a hierarchy of simpler ones. 它基于以声明方式定义数据结构的概念，而不是过程代码：更复杂的构造由更简单的层次结构组成。 It's the first library that makes parsing fun, instead of the usual headache it is today. 它是第一个使解析变得有趣的库，而不是今天常见的头痛。

construct is very robust and powerful, and just reading the tutorial will help you understand the problem better. 构造非常强大和强大，只需阅读教程将帮助您更好地理解问题。 The author also has plans for auto-generating C code from definitions, so it's definitely worth the effort to read about. 作者还计划从定义中自动生成C代码，因此绝对值得花时间阅读。

如何解释C ++中的二进制数据？

问题描述

5 个解决方案

解决方案1
8 2009-05-12 12:04:23

解决方案2
3 2009-05-12 12:03:44

解决方案3
3 已采纳 2009-05-12 20:22:20

解决方案4
1 2009-05-12 23:30:52

解决方案5
1 2009-05-15 12:30:59

如何解释C ++中的二进制数据？

问题描述

5 个解决方案

解决方案1 8 2009-05-12 12:04:23

解决方案2 3 2009-05-12 12:03:44

解决方案3 3 已采纳 2009-05-12 20:22:20

解决方案4 1 2009-05-12 23:30:52

解决方案5 1 2009-05-15 12:30:59

解决方案1
8 2009-05-12 12:04:23

解决方案2
3 2009-05-12 12:03:44

解决方案3
3 已采纳 2009-05-12 20:22:20

解决方案4
1 2009-05-12 23:30:52

解决方案5
1 2009-05-15 12:30:59