将二进制数据从 .bin 文件读入 C++ 中的结构体

Question

I have a set of .bin files containing data in a formally specified format.我有一组 .bin 文件，其中包含正式指定格式的数据。 I know exactly how many bytes there are for each field eg name = 40 bytes, version number = 2 bytes etc. I also know the exact order they are stored in the file (eg name, then version number....).我确切地知道每个字段有多少字节，例如名称 = 40 个字节，版本号 = 2 个字节等。我也知道它们存储在文件中的确切顺序（例如名称，然后是版本号......）。

So far I can load the data from a file into an std::vector<unsigned char> list, then step through that data and read the fields in as per the number of expected bytes.到目前为止，我可以将文件中的数据加载到std::vector<unsigned char>列表中，然后逐步遍历该数据并根据预期字节数读取字段。

The issue is that this method is very long and error prone should I get any of the fields wrong (there's alot of different fields).问题是，如果我弄错任何字段（有很多不同的字段），此方法很长并且容易出错。

I've looked at and talked to people about struct packing, pointer casting and bit fields.我看过并与人们讨论过结构体打包、指针转换和位域。 I just can't seem to get them all to work together.我似乎无法让他们一起工作。

How can I read the data into my buffer, then 'overlay' my struct on the buffer?如何将数据读入我的缓冲区，然后在缓冲区上“覆盖”我的结构？ Then all the fields would populate according to the allocated bit fields I've given each value in the struct.然后所有字段将根据我在结构中给定每个值的分配位字段填充。

The issue with bit fields is that I can't take in strings.位域的问题是我不能接受字符串。

Advice or example code would be highly appreciated.建议或示例代码将不胜感激。 If you'd like just comment and I can give you code to show what I have so far and what I'm trying to achieve.如果你只想发表评论，我可以给你代码来展示我到目前为止所拥有的以及我正在努力实现的目标。

#include <vector>

int main()
{
    //File data loaded by function call
    std::vector<unsigned char> fileData;

    //How do I cast fileData to be a dataFields type? 
}

struct dataFields 
{
    int ID : 8;
    // Cannot use bit field for string type? 
    std::string name;
    int versionNumber : 16;
    int someOtherValue : 8;
}

I cannot give the exact code I'm working on for work reasons but I feel this sumarises what I'm trying to do fairly well in a simple manor.由于工作原因，我无法提供我正在处理的确切代码，但我觉得这总结了我在一个简单的庄园中尝试做得相当好的事情。

Answer 1

No, you indeed cannot use bit pattern for std::string , you wouldn't want to anyway since it contains just a few pointers.不，您确实不能对std::string使用位模式，无论如何您都不想，因为它只包含几个指针。

The usual approach I use in my projects is having POD structs for each record type.我在项目中使用的常用方法是为每种记录类型设置 POD 结构。 Then the lowest layer responsible for {de}serialization converts only between PODs and bytes.然后负责{反}序列化的最低层仅在 POD 和字节之间进行转换。 Any C++ logic, like std::string or variable-length std::vector are dealt with at higher levels.任何 C++ 逻辑，如std::string或可变长度std::vector都在更高级别处理。

#include <array>
#include <type_traits>
#include <cstdint>
#include <cstring>

struct Record{
    std::uint8_t ID;
    std::array<char,40> name;
    std::uint16_t versionNumber;
    std::uint8_t someOtherValue;
};

static_assert(sizeof(Record)==46);
static_assert(offsetof(Record,name)==1);

In my world, I try to have the Record respect the standard alignement to sizeof(E) for each element.在我的世界中，我尝试让Record尊重每个元素的sizeof(E)的标准对齐方式。 You can add packed modifiers if needed.如果需要，您可以添加打包修饰符。 Prefer types from <cstdint> before bitfields. <cstdint>域之前首选来自<cstdint>类型。

I recommend putting a bunch of static_assert s after each Record , verifying its layout.我建议在每个Record之后放置一堆static_assert ，以验证其布局。 Otherwise someone will one day come along and try to "clean up" the code, breaking everything.否则总有一天会有人出现并试图“清理”代码，破坏一切。 It also nicely documents the protocol for the reader.它还很好地为读者记录了协议。

One downside is that this does not easily support putting variable-length members in the middle or having multiple of them, but I never had the need to do so, keep packets simple.一个缺点是这不容易支持将可变长度成员放在中间或拥有多个成员，但我从来没有必要这样做，保持数据包简单。

Also I just decide on fixed endianess for the protocol.此外，我只是决定协议的固定字节序。 If someone needs something else, it's their responsibility to pass correctly encoded Record s for serialization.如果有人需要其他东西，他们有责任传递正确编码的Record以进行序列化。

Serialization helpers:序列化助手：

template<typename T>
T read_value(const unsigned char*& ptr){
    static_assert(std::is_standard_layout_v<T>);

    T value;
    std::memcpy(&value,ptr,sizeof(T));
    ptr+=sizeof(T);
    return value;
}

template<typename T>
void write_value(unsigned char*& ptr, const T& value){
    static_assert(std::is_standard_layout_v<T>);

    std::memcpy(ptr,&value,sizeof(T));
    ptr+=sizeof(T);
}

The lowest layer responsible for {de}serialization can look something like this:负责{反}序列化的最低层可能如下所示：

void deserialize_stream(const unsigned char* bytes){\
    // Output is bunch of POD types.
    auto record1 = read_value<Record>(bytes);
    auto record2 = read_value<Record>(bytes);
}

void serialize_stream(unsigned char* bytes){
    // Input is a list of POD types to serialize.
    Record record1{1,"Foo",12,42};
    Record record2{2,"Bar",14,28};

    write_value(bytes,record1);
    write_value(bytes,record2);
}

Example例子

int main() { 
    // Just a example, CHECK SIZE in real world.
    std::array<unsigned char,1024> buffer;

    serialize_stream(buffer.data());
    deserialize_stream(buffer.data());

}

Answer 2

Consider using a serialization library to do this if this part is not time/storage efficiency bounded.如果这部分不受时间/存储效率限制，请考虑使用序列化库来执行此操作。 Those libraries can serialize your objects into XML or JSON and deserialize it easily.这些库可以将您的对象序列化为 XML 或 JSON 并轻松反序列化。 You do not need to concern about endianness or POD problems.您无需担心字节顺序或 POD 问题。

将二进制数据从 .bin 文件读入 C++ 中的结构体

问题描述

2 个解决方案

解决方案1
0 已采纳 2021-11-03 11:53:36

解决方案2
0 2021-11-03 15:06:02

将二进制数据从 .bin 文件读入 C++ 中的结构体

问题描述

2 个解决方案

解决方案1 0 已采纳 2021-11-03 11:53:36

解决方案2 0 2021-11-03 15:06:02

解决方案1
0 已采纳 2021-11-03 11:53:36

解决方案2
0 2021-11-03 15:06:02