简体   繁体   中英

Persistence Strategies for main-memory B+ Trees

I am trying to develop a main-memory index for key-value pairs using C++. I need to make sure the index is recoverable after a crash. I am using a CSB+-Tree implementation (BSD Licence) that I found here The main challenge I am facing is maintaining the parent-child relation data after re-instantiating the nodes. I have searched for various strategies to save and recover a "tree-structure" to/from a disk. Some of them are:

  1. Saving the nodes objects in Pre-order and writing NULLS for empty child pointers.
  2. Giving IDS to nodes and saving the ID of a node instead of the pointer while writing to disk and then resolving the pointers during re-instantiation using the IDs.
  3. Using file-offset values (addresses in physical memory) rather than main memory addresses of the child nodes while saving. This might mean I have to save from leaf-up.

I have also looked at a couple of serialization libraries. Google ProtocolBuffers and Boost Serialization.

Now the "Nodes" in the implementations have a number of pointer variables.Some of these are pointers to other nodes, while others are pointers to "key values". The code below is simplified version to retain the essence.

struct NodeHead  
    NodeHead *null; // null indicates internal node  
    char *children; // ptr to children  
    NodeEntry entries[1]; // entry array  

struct NodeEntry  
    uint16_t offset;   // offset to NodeHead of the key in byte  
    uint8_t next;   // index of the next entry; 0xff means null  
    uint8_t num;    // [0]: number of entries in use  

I was thinking of writing the entry values directly into the data for the nodehead rather than saving a link.And giving each NodeHead instance an ID and use that to maintain the "children" relationships. I would like some advice if this can be done in a better way.

Are the data (key, value) pairs kept separately on disk, or do you need to persist them along with the index? Do you keep the data itself in-memory, or is only the index memory-resident, while the data is on disk? If the whole dataset is memory-resident, don't persist the tree structure at all. Just save the ordered list of (key, value) pairs and rebuild the tree on load. I never used that library, but any reasonable B-tree implementation should be able to build an in-memory B-tree out of a pre-sorted stream of records very efficiently.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM