简体   繁体   中英

Binary de-serialization - how to find, what type of serialization is it?

I have a .dat file generated by a program. The program owner is not against parsing and editing this file, but he will not give anyone answers.

The file mostly consists of variables that are defined in this way:

In most cases:

(4 bytes - length of the var name)(var name)(4 bytes of some internal var type)(4 bytes - possibly are elements count)(X bytes of var value)

Rarely:

(4 bytes - length of the var name)(var name)(1 zero byte)(4 bytes of some internal var type)

So, for example:

([4 0 0 0][name])[11 0 0 0][1 0 0 0]([9 0 0 0][Alexander])

and

([8 0 0 0][names])[6 0 0 0](length [3 0 0 0])([4 0 0 0][John])([4 0 0 0][Anne])([7 0 0 0][SomeGuy])

I tried to look at boost binary serialization but it doesn't add variable names in the file and I think uses 8 bytes, not 4.

There is no generic way to determine "what type of serialization" it is. The author of the format has made design decisions and arrived at a final format. It could be literally anything. You can make educated guesses ("reverse engineering") but the only way to know for sure is to obtain a specification from the author. Although you claim that he doesn't mind people manipulating files stored in this format, his refusal to provide said specification makes me wonder whether this is really true and, ultimately, means you may have to stick with guesswork.

To add the the answer of BoundaryImposition there is no deserialization framework (that I know of) that can deal with "any" format. The format must be known and implemented by the library. So you need to do implement it yourself.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM