简体   繁体   中英

Serializing values to a string of bytes in a platform-independent way

I'm writing some serialization code that will work at a lower level than I'm used to. I need functions to take various value types ( int32_t , int64_t , float , etc.) and shove them into a vector<unsigned char> in preparation for being written to a file. The file will be read and reconstituted in an analogous way.

The functions to write to the vector look like this:

void write_int32(std::vector<unsigned char>& buffer, int32_t value)
{
    buffer.push_back((value >> 24) & 0xff);
    buffer.push_back((value >> 16) & 0xff);
    buffer.push_back((value >> 8) & 0xff);
    buffer.push_back(value & 0xff);
}

void write_float(std::vector<unsigned char>& buffer, float value)
{
    assert(sizeof(float) == sizeof(int32_t));

    write_int32(buffer, *(int32_t *)&value);
}

These bit-shifting, type-punning atrocities seem to work, on the single machine I've used so far, but they feel extremely fragile. Where can I learn which operations are guaranteed to yield the same results across architectures, float representations, etc.? Specifically, is there a safer way to do what I've done in these two example functions?

A human readable representation is the most safe. XML with an xsd is one option that can allow you to exactly specify value and format.

If you really want a binary representation, look at the hton* and ntoh* functions:

http://beej.us/guide/bgnet/output/html/multipage/htonsman.html

Usually the best way to do this is to employ an external library designed for this purpose -- it's all to easy to introduce platform disagreement bugs, especially when trying to transmit info like floating point types. There are multiple options for open-source software that does this. One example is Google Protocol Buffers , which in addition to being platform-neutral has the benefit of being language-independent (it generates code for use in serialization based on messages you define).

I wanted something quick and lightweight so I whipped up a simple and stupid text serialization format. Each value is written to the file using something barely more complicated than

output_buffer << value << ' ';

Protocol Buffers would have worked okay but I was worried they'd take too long to integrate. XML's verbosity would have been a problem for me—I need to serialize thousands of values and even having <a>...</a> wrapping each number would have added nearly a megabyte to each file. I tried MessagePack but it just seemed like an awkward fit with C++'s static typing. What I came up with isn't clever but it works great.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM