简体   繁体   中英

Persistence Strategies for main-memory B+ Trees

I am trying to develop a main-memory index for key-value pairs using C++. I need to make sure the index is recoverable after a crash. I am using a CSB+-Tree implementation (BSD Licence) that I found here The main challenge I am facing is maintaining the parent-child relation data after re-instantiating the nodes. I have searched for various strategies to save and recover a "tree-structure" to/from a disk. Some of them are:

  1. Saving the nodes objects in Pre-order and writing NULLS for empty child pointers.
  2. Giving IDS to nodes and saving the ID of a node instead of the pointer while writing to disk and then resolving the pointers during re-instantiation using the IDs.
  3. Using file-offset values (addresses in physical memory) rather than main memory addresses of the child nodes while saving. This might mean I have to save from leaf-up.

I have also looked at a couple of serialization libraries. Google ProtocolBuffers and Boost Serialization.

Now the "Nodes" in the implementations have a number of pointer variables.Some of these are pointers to other nodes, while others are pointers to "key values". The code below is simplified version to retain the essence.

struct NodeHead  
{  
    NodeHead *null; // null indicates internal node  
    char *children; // ptr to children  
    NodeEntry entries[1]; // entry array  
}

struct NodeEntry  
{  
    uint16_t offset;   // offset to NodeHead of the key in byte  
    uint8_t next;   // index of the next entry; 0xff means null  
    uint8_t num;    // [0]: number of entries in use  
};

I was thinking of writing the entry values directly into the data for the nodehead rather than saving a link.And giving each NodeHead instance an ID and use that to maintain the "children" relationships. I would like some advice if this can be done in a better way.

Are the data (key, value) pairs kept separately on disk, or do you need to persist them along with the index? Do you keep the data itself in-memory, or is only the index memory-resident, while the data is on disk? If the whole dataset is memory-resident, don't persist the tree structure at all. Just save the ordered list of (key, value) pairs and rebuild the tree on load. I never used that library, but any reasonable B-tree implementation should be able to build an in-memory B-tree out of a pre-sorted stream of records very efficiently.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM