如何将所有 arrays 字节的数据成员所有内容打包在一个向量中？

Question

I have a created a function serialize which takes the Data { a class containing 4 members int32,int64,float,double) as input and returns a encoded vector of bytes of all elements which I will further pass to deserialize function to get the original data背部。

std::vector<uint8_t> serialize(Data &D)
{

    std::vector<uint8_t> seriliazed_data;
    std::vector<uint8_t> intwo = encode(D.Int32);  // output [32 13 24 0]
    std::vector<uint8_t> insf = encode(D.Int64);    // output [233 244 55 134 255 23 55] 
    // float
    float ft = D.Float;    // float value eg 4.55 
    float *a;                 // I will encode them in binary format
    char result[sizeof(float)];
    memcpy(result, &ft, sizeof(ft));
    // double
    double dt = D.Double;    // double value eg 4.55 
    double *c;                 // I will encode them in binary format
    char resultdouble[sizeof(double)];
    memcpy(resultdouble, &dt, sizeof(dt));
       /////
       ///// How to bind everything  here
       /////

    return seriliazed_data;
}


 Data deserialize(std::vector<uint8_t> &Bytes)  /// Vector returned from above function { 
    
     Data D2;
  
    D2.Int64 = decode(Bytes, D2);
    // D2.Int32 = decode(Bytes, D2);
    // D2.float = decode(Bytes, D2);
    // D2.double = decode(Bytes, D2);
    
    /// Return original data ( All class members)
    return D2;
}

我不知道如何前进.. Q1。 如果我将所有内容绑定在一个向量中，我将如何在反序列化时剖析它们。 应该有某种分隔符？ Q2。 有没有更好的方法来做到这一点。

Answer 1

如果我将所有内容绑定在一个向量中，我将如何在反序列化时剖析它们。 应该有某种分隔符？

在 stream 中，您要么知道接下来会出现什么类型，要么您必须在 stream 中有某种类型的指示符。 “这里有一个size ... 的int vector ”等：

vector int size elem1 elem2 ... elemX

根据您需要支持的类型数量，类型信息可能是 1 个或更多字节。 如果最小的“未知”实体是您的类，那么您需要为每个要支持的 class 提供一个指标。

如果您确切知道 stream 中应该包含什么内容，则可以省略vector和int的类型信息：

size elem1 elem2 ... elemX

Q2。 有没有更好的方法来做到这一点。

一种简化可能是使serialize更通用，以便您可以重用它。 如果你有一些

std::vector<uint8_t> encode(conts T& x)

对于您想要支持的基本类型（可能还有容器类型）的重载，您可以将其设置为：

template <class... Ts>
std::vector<uint8_t> serialize(Ts&&... ts) {
    std::vector<uint8_t> serialized_data;

    [](auto& data, auto&&... vs) {
        (data.insert(data.end(), vs.begin(), vs.end()), ...);
    }(serialized_data, encode(ts)...);

    return serialized_data;
}

然后，您可以简单地通过使用所有成员变量调用serialize来为 class 编写序列化，并且可以使复合类型的序列化变得非常容易：

struct Foo {
    int32_t x;                  // encode(int32_t) needed
    std::string y;              // encode(const string&) needed
    std::vector<std::string> z; // encode(const vector<T>&) + encode(const string&)
};

std::vector<uint8_t> encode(const Foo& f) {
    return serialize(f.x, f.y, f.z);
}

struct Bar {
    Foo f;                      // encode(const Foo&) needed
    std::string s;              // encode(const string&) needed
};

std::vector<uint8_t> encode(const Bar& b) {
    return serialize(b.f, b.s);
}

以上使类的编码非常简单。 要添加序列化，您可以添加一个适配器，该适配器仅引用 object 进行序列化、编码并将编码数据写入ostream ：

struct BarSerializer {
    Bar& b;
    friend std::ostream& operator<<(std::ostream& os, const BarSerializer& bs) {
        auto s = encode(bs.b);  // encode(const Bar&) needed
        return os.write(reinterpret_cast<const char*>(s.data()), s.size());
    }    
};

您将制作deserialize化 function 模板并以类似方式decode重载。

Answer 2

对此有一个高吞吐量的解决方案，但它需要一些注意事项。

您有一个支持打包 alignment 的编译器和体系结构。 GCC、clang、ICC 和 MSVC 都可以，但效率取决于您的架构。 好消息，可能是：i386 / x86_64 几乎是一个传奇，它不会为未对齐的 memory 访问支付罚金。 SIMD 将无法工作。
你必须在你的结构中使用 POD 成员——std::vector、std::string、maps、sets、deques、lists、智能指针在这里不起作用。 但是一堆整数和浮点数就可以了。 可以通过自定义重新实现这些其他结构来解决此问题，但让我们保持简单。 您可以嵌入其他structs等，只要它们也是 POD 即可。 （POD == 普通旧数据https://en.cppreference.com/w/cpp/language/classes#POD_class ）
您的数据以相同的字节顺序传输到发送方和接收方（也适用于实现例如operator int32_t()的自定义数据类型，但#define或consteval为字节序。）
您的通信通道重复发送单个结构，或者可以依靠一个常见的 header （用于多种结构类型）在switch中进行调度。

然后您的代码变为：

#pragma pack(push,1)  
struct D
{
   int32_t Int32;
   int64_t Int64;
   float Float;
   double Double;
};
#pragma pack(pop)

const char * serialize(const Data& d)
{
    return reinterpret_cast<const char *>(&d);
}

const Data& deserialize(const char * buffer)
{
    return *reinterpret_cast<const Data*>(buffer);
}

您需要的数据量？ 总是sizeof(D) 。 所以serialize总是会给出一个指向sizeof(D)数据的const char *指针，并且你需要读取sizeof(D)数据才能传递给deserialize 。

现在，当然，您可以在std::vector<uint8_t>中弹出所有这些内容。 但这里的巧妙之处在于，根本不需要 memory 副本。 您实际上可以使用 object 本身进行序列化，以及来自您反序列化的任何介质的原始char *数据，而无需任何副本或昂贵的字段操作。

哦。 并编辑添加：Google protobufs 或 Cap'n proto 之类的东西可能有助于解决您可能希望解决的问题的一般情况。

如何将所有 arrays 字节的数据成员所有内容打包在一个向量中？

问题描述

2 个解决方案

解决方案1
1 已采纳 2022-08-23 23:26:36

解决方案2
0 2022-08-24 01:26:00

如何将所有 arrays 字节的数据成员所有内容打包在一个向量中？

问题描述

2 个解决方案

解决方案1 1 已采纳 2022-08-23 23:26:36

解决方案2 0 2022-08-24 01:26:00

解决方案1
1 已采纳 2022-08-23 23:26:36

解决方案2
0 2022-08-24 01:26:00