简体   繁体   中英

How to load/save C++ class instance (using STL containers) to disk

I have a C++ class representing a hierarchically organised data tree which is very large (~Gb, basically as large as I can get away with in memory). It uses an STL list to store information at each node plus iterators to other nodes. Each node has only one parent, but 0-10 children. Abstracted, it looks something like:

struct node {
public:
    node_list_iterator parent;              // iterator to a single parent node
    double node_data_array[X];
    map<int,node_list_iterator> children;   // iterators to child nodes
};

class strategy {
private:
    list<node> tree;        // hierarchically linked list of nodes
    struct some_other_data;
public:
    void build();           // build the tree
    void save();            // save the tree from disk
    void load();            // load the tree from disk
    void use();             // use the tree
};

I would like to implement the load() and save() to disk, and it should be fairly fast, however the obvious problems are:

  1. I don't know the size in advance;

  2. The data contains iterators, which are volatile;

  3. My ignorance of C++ is prodigious.

Could anyone suggest a pure C++ solution please?

You can use boost.serialization library. This would save entire state of your container, even the iterators.

boost.serialization 是一种解决方案,或者恕我直言,您可以使用 SQLite + 访问者模式来加载和保存这些节点,但这听起来并不容易。

It seems like you could save the data in the following syntax:

File = Meta-data Node
Node = Node-data ChildCount NodeList
NodeList = sequence (int, Node)

That is to say, when serialized the root node contains all nodes, either directly (children) or indirectly (other descendants). Writing the format is fairly straightforward: just have a recursive write function starting at the root node.

Reading isn't that much harder. std::list<node> iterators are stable. Once you've inserted the root node, its iterator will not change, not even when inserting its children. Hence, when you're reading each node you can already set the parent iterator. This of course leaves you with the child iterators, but those are trivial: each node is a child of its parents. So, after you've read all nodes you'll fix up the child iterators. Start with the second node, the first child (The first node one was the root) and iterate to the last child. Then, for each child C, get its parent and the child to its parent's collection. Now, this means that you have to set the int child IDs aside while reading, but you can do that in a simple std::vector parallel to the std::list<node> . Once you've patched all child IDs in the respective parents, you can discard the vector.

Boost Serialization has already been suggested, and it's certainly a reasonable possibility.

A great deal depends on how you're going to use the data -- the fact that you're using a multiway tree in memory doesn't mean you necessarily have to store it as a multiway tree on disk. Since you're (apparently) already pushing the limits of what you can store in memory, the obvious question is whether you're just interested in serializing the data so you can re-constitute the same tree when needed, or whether you want something like a database so you can load parts of the information into memory as needed, and update records as needed.

If you want the latter, some of your choices will also depend on how static the structure is. For example, if a particular node has N children, is that number constant or is it subject to change? If it's subject to change, is there a limit on the maximum number of children?

If you do want to be able to traverse the structure on disk, one obvious possibility would be as you write it out, substitute the file offset of the appropriate data in place of the iterator you're using in memory.

Alternatively, since it looks like (at least most of) the data in an individual node has a fixed size, you might create a database-like structure of fixed-size records, and in each record record the record numbers of the parent/children.

Knowing the overall size in advance isn't particularly important (offhand, I can't think of any way I'd use the size even if it was known in advance).

Actually, I think your best option is to move the entire data structure into database tables. That way you get the benefit of people much smarter then you (or me) having dealt with issues of serialization. It will also prevent you from having to worry about whether the structure can fit into memory.

I've answered something like this on SO before, so I will summarize:
1. Use a database.
2. Substitute file offsets for links (pointers).
3. Store the data without the tree structure, in records, as a database would .
4. Use XML to create the tree structure, using node names instead of links.
5. This would be soooo much easier if you used a database like SqLite or MySQL .

When you spend too much time on the "serialization" and less on the primary purpose of your project, you need to use a database .

如果您是为了持久性而这样做,那么您可以从网络上使用几种解决方案,即谷歌“persist std::list”,或者您可以使用 mmap 自行创建一个文件支持的内存区域。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM