简体   繁体   中英

How should I write a struct to a file in C?

I'm trying to write a struct to a binary file. I want my code to be cross platform, so I'm not sure about just writing the whole struct with a fwrite. If I did this, then the size of the struct will vary depending on the size of the primitive types for each platform(in platform A, an int won't have the same size as in platform B. So the struct won't be the same size either, and the file will end up being different).

But I know nothing about this, so should I write each member of the struct individually, serialize the struct (how can I do this?) , or just write the struct whit a fwrite? Remember that the file written should be compatible across platforms

Thanks in advance

EDIT: My struct is something like

typedef struct {
    int health;
    float x, y;
    char ID[];
} Player;

Since you stated that you're building a game prototype and you mainly want to work with x86 and x86_64 CPUs, binary serialization is somewhat easier.

But first, there are some things to bear in mind:

  • long values have different size in x86 and x86_64.
  • So, long values also have different alignment requirements.
  • double values may require 8 or 4 byte alignment, depending on OS and architecture.
  • Endianness is usually a problem, but newer Apple products do not use PowerPC, so you can pretty much guarantee that you're working for little-endian machines.

One last info before we begin: You can gather information about about the system architecture in compile time, using compiler macros. In gcc, __LP64__ means that you're compiling an 64 bit executable. I'm sure there are similar macros for msvc.

Tackling different sized variable problem:

stdint.h file contains typedefs for int8_t int16_t int32_t int64_t uint8_t uint16_t uint32_t uint64_t . All of them are guaranteed to be that number of bits. You can safely use int64_t instead of long among platforms. If there is a platform that doesn't have those typedefs for some reason, you can define them using typedefs yourself, since you can get information about the target system using the compiler preprocessor macros anyway.

Tackling different alignment problem:

This one is hairy. The simplest solution would be to have a serializable_player struct containing all the fields that player struct contains, but tell the compiler to pack it ( packed attribute for gcc), so that the compiler doesn't put any padding. Then when writing to a file, you create a serializable_player from player and write it directly.

Conversion from/to a serializable type might be an overhead. If you can afford wasting a little memory, you can also enforce alignment to each member of a struct. Aligning double values with 8 bytes may be a good idea here. In gcc, you do that with aligned attribute.

Note: Packed structs usually have a performance penalty and since you're going to use player in the whole game, do not make it packed.

Tackling different floating point representations:

If there is a chance that you may need to support an architecture that uses a different floating point representation, you won't be able to store them in binary.

I used to work in a game development company and when sending floating points over network, we represented them as integers instead. What we did was this: calculate the minimum resolution we want r . Then calculate minimum and maximum value it can be (depending on the multiplayer map players were in), min and max . In this representation, we could represent (max - min) / r different numbers and we needed log2((max - min) / r) bits to store it. Since the receiver also knew about min, max, r, we did not need to include that information in the network packet. We said that min is represented as all 0s, max is represented as all 1s, and the other values are in between. It was a real time action game and the maps weren't too big (multiplayer supported 64 players), we had no trouble with even less than 32 bits, players weren't shaking/flickering, the performance was good.

If you use a similar approach, you will have no trouble serializing/deserializing floating points in binary.

Finally, if you target a big-endian architecture in the future, only small steps (if you're not using union) would be necessary. You only need to have convert_to_le_* functions for each size. Those functions should be empty for little endian machines, do bitwise arithmetic for big endian machines. Before each serialization and after each deserialization, you should call these functions for each member that would be represented differently in big endian machines. Favoring little-endian machines might be a better idea here, since your main audience will probably have x86 and x86_64 machines.

If you're using union, your struct representation should also differ among different architectures.

I know that storing values in JSON or plaintext is easier and space optimization in hard disk is almost always unnecessary. But if you are planning to make a multiplayer game, preparing for binary serialization/deserialization beforehand might be rewarding, since you won't have the luxury to send JSON data in real time.

Edit: As suggested in comments, using JSON seems better if you want to change your file structure, add more fields etc. If there is a chance to do so and you still decide to use a binary file structure, your save files should contain a magic number at the first few bytes, representing version. When you decide to add/remove a field, you should update the version. When you're reading from a save file, you should check version first, and handle each version accordingly.

Edit 2: Some parts of this answer are specific to x86 and x86_64 CPUs. For example, using long instead of int64_t makes sense since long is 8 bytes in both architectures. If a wider area needed to be supported, I would only recommend int*_t typedefs. An example here is linux kernel, in which s32 is used for signed 32 bit integer if a size enforcement is required.

You may choose the easy way or the hard way.

Hard way : Try writing the struct in binary format ; and struggle with endianness and integer size issues. This is a hard problem, and complete libraries have been designed trying to answer it ( protobuf-c , tpl and Apache Avro -- sorry, limited to 2 links -- are the first 3 examples I fell upon).

Now the easy way.

Write to a text file.

(ASCII) Text format is almost the only thing you can account on for complete portability : it's a standard way of writing text ; and we humans have more or less had a compromise about how to write numbers.

As your structure is simple, you could just write the fields, each on its line, and for loading just read each field one after the other.

The only issue that may arise could be with ID ; the format of which you did not describe, so I cannot be more precise.

Hoping that helps,

Ekleog

Actually, you have to make an "interface specification" as the struct in the file, to be read by someone else, is an interface from your program with someone else.

So you define the format of the file (of the data in the file). For example "every line holds one field. The first line holds the health as an ASCII number, the second line..." etcetera.

Then you "publish" your specification for everyone who will ever need to read your files.

Of course, you must now adapt your program to emit the file as you have specified.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM