简体   繁体   English

如何将 C++ 类实例(使用 STL 容器)加载/保存到磁盘

[英]How to load/save C++ class instance (using STL containers) to disk

I have a C++ class representing a hierarchically organised data tree which is very large (~Gb, basically as large as I can get away with in memory).我有一个 C++ 类,表示一个非常大的分层组织的数据树(~Gb,基本上是我在内存中可以逃脱的大小)。 It uses an STL list to store information at each node plus iterators to other nodes.它使用 STL 列表来存储每个节点的信息以及其他节点的迭代器。 Each node has only one parent, but 0-10 children.每个节点只有一个父节点,但有 0-10 个子节点。 Abstracted, it looks something like:抽象后,它看起来像:

struct node {
public:
    node_list_iterator parent;              // iterator to a single parent node
    double node_data_array[X];
    map<int,node_list_iterator> children;   // iterators to child nodes
};

class strategy {
private:
    list<node> tree;        // hierarchically linked list of nodes
    struct some_other_data;
public:
    void build();           // build the tree
    void save();            // save the tree from disk
    void load();            // load the tree from disk
    void use();             // use the tree
};

I would like to implement the load() and save() to disk, and it should be fairly fast, however the obvious problems are:我想将 load() 和 save() 实现到磁盘,它应该相当快,但是明显的问题是:

  1. I don't know the size in advance;我事先不知道尺寸;

  2. The data contains iterators, which are volatile;数据包含易变的迭代器;

  3. My ignorance of C++ is prodigious.我对 C++ 的无知是惊人的。

Could anyone suggest a pure C++ solution please?任何人都可以建议一个纯 C++ 解决方案吗?

You can use boost.serialization library.您可以使用 boost.serialization 库。 This would save entire state of your container, even the iterators.这将保存容器的整个状态,甚至是迭代器。

boost.serialization 是一种解决方案,或者恕我直言,您可以使用 SQLite + 访问者模式来加载和保存这些节点,但这听起来并不容易。

It seems like you could save the data in the following syntax:似乎您可以使用以下语法保存数据:

File = Meta-data Node
Node = Node-data ChildCount NodeList
NodeList = sequence (int, Node)

That is to say, when serialized the root node contains all nodes, either directly (children) or indirectly (other descendants).也就是说,当序列化时,根节点包含所有节点,直接(孩子)或间接(其他后代)。 Writing the format is fairly straightforward: just have a recursive write function starting at the root node.编写格式相当简单:只需从根节点开始递归写入函数即可。

Reading isn't that much harder.阅读并没有那么难。 std::list<node> iterators are stable. std::list<node>迭代器是稳定的。 Once you've inserted the root node, its iterator will not change, not even when inserting its children.一旦你插入了根节点,它的迭代器就不会改变,即使是在插入它的子节点时也是如此。 Hence, when you're reading each node you can already set the parent iterator.因此,当您读取每个节点时,您已经可以设置父迭代器。 This of course leaves you with the child iterators, but those are trivial: each node is a child of its parents.这当然给您留下了子迭代器,但这些都是微不足道的:每个节点都是其父节点的子节点。 So, after you've read all nodes you'll fix up the child iterators.因此,在您阅读完所有节点后,您将修复子迭代器。 Start with the second node, the first child (The first node one was the root) and iterate to the last child.从第二个节点开始,第一个子节点(第一个节点是根节点)并迭代到最后一个子节点。 Then, for each child C, get its parent and the child to its parent's collection.然后,对于每个子 C,将其父项和子项放入其父项的集合中。 Now, this means that you have to set the int child IDs aside while reading, but you can do that in a simple std::vector parallel to the std::list<node> .现在,这意味着您必须在读取时将int子 ID 放在一边,但您可以在与std::list<node>平行的简单 std::vector 中执行此操作。 Once you've patched all child IDs in the respective parents, you can discard the vector.一旦您修补了相应父项中的所有子 ID,您就可以丢弃该向量。

Boost Serialization has already been suggested, and it's certainly a reasonable possibility.已经建议使用 Boost Serialization,这当然是一个合理的可能性。

A great deal depends on how you're going to use the data -- the fact that you're using a multiway tree in memory doesn't mean you necessarily have to store it as a multiway tree on disk.很大程度上取决于您将如何使用数据——您在内存中使用多路树这一事实并不意味着您必须将其作为多路树存储在磁盘上。 Since you're (apparently) already pushing the limits of what you can store in memory, the obvious question is whether you're just interested in serializing the data so you can re-constitute the same tree when needed, or whether you want something like a database so you can load parts of the information into memory as needed, and update records as needed.由于您(显然)已经在推动您可以在内存中存储的内容的限制,因此显而易见的问题是您是否只是对序列化数据感兴趣,以便在需要时重新构建同一棵树,或者您是否想要一些东西就像数据库一样,因此您可以根据需要将部分信息加载到内存中,并根据需要更新记录。

If you want the latter, some of your choices will also depend on how static the structure is.如果您想要后者,您的一些选择还取决于结构的静态程度。 For example, if a particular node has N children, is that number constant or is it subject to change?例如,如果一个特定节点有 N 个孩子,这个数字是恒定的还是会发生变化? If it's subject to change, is there a limit on the maximum number of children?如果可能会发生变化,是否有最大儿童人数限制?

If you do want to be able to traverse the structure on disk, one obvious possibility would be as you write it out, substitute the file offset of the appropriate data in place of the iterator you're using in memory.如果您确实希望能够遍历磁盘上的结构,一种明显的可能性是在您写出它时,用适当数据的文件偏移量代替您在内存中使用的迭代器。

Alternatively, since it looks like (at least most of) the data in an individual node has a fixed size, you might create a database-like structure of fixed-size records, and in each record record the record numbers of the parent/children.或者,由于看起来(至少大部分)单个节点中的数据具有固定大小,您可以创建一个类似数据库的固定大小记录结构,并在每条记录中记录父/子的记录编号.

Knowing the overall size in advance isn't particularly important (offhand, I can't think of any way I'd use the size even if it was known in advance).提前知道整体尺寸并不是特别重要(即使事先知道尺寸,我也想不出任何使用尺寸的方法)。

Actually, I think your best option is to move the entire data structure into database tables.实际上,我认为您最好的选择是将整个数据结构移动到数据库表中。 That way you get the benefit of people much smarter then you (or me) having dealt with issues of serialization.这样你就可以从比你(或我)处理序列化问题更聪明的人那里受益。 It will also prevent you from having to worry about whether the structure can fit into memory.它还可以让您不必担心结构是否适合内存。

I've answered something like this on SO before, so I will summarize:我之前在 SO 上回答过类似的问题,所以我总结一下:
1. Use a database. 1.使用数据库。
2. Substitute file offsets for links (pointers). 2. 用文件偏移量代替链接(指针)。
3. Store the data without the tree structure, in records, as a database would . 3. 将没有树结构的数据存储在记录中,就像数据库一样
4. Use XML to create the tree structure, using node names instead of links. 4. 使用 XML 创建树结构,使用节点名称而不是链接。
5. This would be soooo much easier if you used a database like SqLite or MySQL . 5. 如果你使用像 SQLite 或 MySQL 这样的数据库,这会容易得多。

When you spend too much time on the "serialization" and less on the primary purpose of your project, you need to use a database .当您在“序列化”上花费太多时间而在项目的主要目的上花费的时间较少时,您需要使用数据库

如果您是为了持久性而这样做,那么您可以从网络上使用几种解决方案,即谷歌“persist std::list”,或者您可以使用 mmap 自行创建一个文件支持的内存区域。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM