简体   繁体   中英

Save and load data using ANSI C on any platform

Say I have 1 million structs, each containing integers, doubles, strings, and other structs, something like:

struct s1 {
    int f1;
    long f2;
    char* f3;
};

struct s2 {
    struct s1* f1;
    double f2;
};

How can I save these to a file in binary format, then look up and load them from that file, on platforms different from the one the executable was compiled on, without worrying about endianness, float representation and other platform-specific gotchas?

The reason for preferring a binary format is mainly size of the resulting file. If integers alone look like "32435" and I have millions of them, the extra 3 bytes per integer would add quite a bit of size to the file.

Write them as ascii text, XML or some similar non-binary format.

"platforms different from the one the executable was compiled on"

How different from the one the executable was compiled on? Do you need to support platforms that use non-IEEE floats? Platforms that use non-ASCII characters? Platforms that use non-8-bit bytes?

If you insist on binary, and insist on doing it yourself, probably your best bet is to define that in the storage format, int and long will each be stored as a sequence of 4 bytes, little-endian (or big-endian, but pick one and stick to it regardless of platform), containing exactly 8 significant bits per byte. double will be an IEEE double likewise. Pointers introduce a whole world of hurt, the storage format must attach a unique identifier to each instance of s1 , and then a pointer to s1 can be stored as an id value, and looked up as part of deserialization.

Different platforms can then decide what types they want to use for each of the storage types (so for example if int is only 16 bits on a give platform, it will just have to use long for both the int and long types. For this reason, you should give them domain-specific pseudonyms). Beware that it's impossible to avoid loss of precision in double values when converting to and from incompatible representations, since they might not have the same number of significant bits.

For text, non-ASCII platforms will have to include code to serialize their own text format to ASCII, and to deserialize ASCII to native text. Strictly speaking, you should also avoid using any characters in the text that aren't in the C basic character set, since they might not be representable at all on the target. You can make a similar decision whether you're willing to count on target platforms to support Unicode in some way -- if so then UTF-8 is a reasonable interchange format for text.

Finally, for each struct on each platform, you can either:

  1. write (or perhaps auto-generate) code to serialize it, and code to deserialize it, or:
  2. make yourself a domain-specific language to define the structures, and a parser/interpreter that will serialize and deserialize according to that definition.

Sounds like a lot of work to me, though, to do something that's been done before.

If you want to avoid the headaches that you've described, DON'T use binary. Use text, the universal* format.

*until you start getting into locales.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM