简体   繁体   English

以平台无关的方式将值序列化为字节字符串

[英]Serializing values to a string of bytes in a platform-independent way

I'm writing some serialization code that will work at a lower level than I'm used to. 我正在编写一些序列化代码,这些代码将在比以前更低的级别上工作。 I need functions to take various value types ( int32_t , int64_t , float , etc.) and shove them into a vector<unsigned char> in preparation for being written to a file. 我需要使用各种值类型( int32_tint64_tfloat等)的函数,并将它们推到vector<unsigned char>以准备写入文件。 The file will be read and reconstituted in an analogous way. 该文件将以类似的方式读取和重构。

The functions to write to the vector look like this: 写入向量的函数如下所示:

void write_int32(std::vector<unsigned char>& buffer, int32_t value)
{
    buffer.push_back((value >> 24) & 0xff);
    buffer.push_back((value >> 16) & 0xff);
    buffer.push_back((value >> 8) & 0xff);
    buffer.push_back(value & 0xff);
}

void write_float(std::vector<unsigned char>& buffer, float value)
{
    assert(sizeof(float) == sizeof(int32_t));

    write_int32(buffer, *(int32_t *)&value);
}

These bit-shifting, type-punning atrocities seem to work, on the single machine I've used so far, but they feel extremely fragile. 在我到目前为止使用的单台机器上,这些有点移位,易于处理的暴行似乎可以正常工作,但它们却非常脆弱。 Where can I learn which operations are guaranteed to yield the same results across architectures, float representations, etc.? 在哪里可以了解到哪些操作可以保证在体系结构,浮点表示等方面产生相同的结果? Specifically, is there a safer way to do what I've done in these two example functions? 具体来说,是否有更安全的方法来完成我在这两个示例函数中所做的工作?

A human readable representation is the most safe. 可读的表示法是最安全的。 XML with an xsd is one option that can allow you to exactly specify value and format. 带有xsd的XML是一种选项,可以允许您精确地指定值和格式。

If you really want a binary representation, look at the hton* and ntoh* functions: 如果您真的想要二进制表示形式,请查看hton*ntoh*函数:

http://beej.us/guide/bgnet/output/html/multipage/htonsman.html http://beej.us/guide/bgnet/output/html/multipage/htonsman.html

Usually the best way to do this is to employ an external library designed for this purpose -- it's all to easy to introduce platform disagreement bugs, especially when trying to transmit info like floating point types. 通常,最好的方法是使用为此目的而设计的外部库-引入平台不一致错误很容易,尤其是在尝试传输浮点类型之类的信息时。 There are multiple options for open-source software that does this. 开源软件有多个选项可以做到这一点。 One example is Google Protocol Buffers , which in addition to being platform-neutral has the benefit of being language-independent (it generates code for use in serialization based on messages you define). 一个示例是Google Protocol Buffers ,它除了与平台无关外,还具有与语言无关的优点(它根据您定义的消息生成用于序列化的代码)。

I wanted something quick and lightweight so I whipped up a simple and stupid text serialization format. 我想要快速,轻便的东西,所以我提出了一种简单而愚蠢的文本序列化格式。 Each value is written to the file using something barely more complicated than 每个值都使用几乎没有什么复杂的东西写入文件

output_buffer << value << ' ';

Protocol Buffers would have worked okay but I was worried they'd take too long to integrate. Protocol Buffers可以正常工作,但我担心它们需要太长时间才能集成。 XML's verbosity would have been a problem for me—I need to serialize thousands of values and even having <a>...</a> wrapping each number would have added nearly a megabyte to each file. XML的冗长性对我来说将是一个问题-我需要序列化数千个值,甚至用<a>...</a>包装每个数字都会为每个文件增加近兆字节。 I tried MessagePack but it just seemed like an awkward fit with C++'s static typing. 我尝试了MessagePack,但它似乎与C ++的静态类型有点尴尬。 What I came up with isn't clever but it works great. 我想出的不是很聪明,但是效果很好。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM