简体   繁体   English

使用B +树创建索引文件

[英]creating index file using B+ trees

I have this interesting challenge where a B+ tree is the index for a data file. 我遇到了一个有趣的挑战,其中B +树是数据文件的索引。 This index has to be saved in a index file, and later we have to load the index into memory from this index file. 该索引必须保存在索引文件中,以后我们必须将该索引从该索引文件加载到内存中。

This is the B+ tree structure I have, from http://www.amittai.com/prose/bpt.c 这是我拥有的B +树结构,来自http://www.amittai.com/prose/bpt.c

typedef struct node 
{
    void ** pointers;
    int * keys;
    struct node * parent;
    bool is_leaf;
    int num_keys;
} node; 

As you can see the code for the tree is pretty neat and works properly, so now I have a b+ tree functioning as index. 如您所见,该树的代码非常整洁并且可以正常工作,所以现在我有了一个b +树作为索引。 But I can't simply write each node from the tree into a file... The pointers written there wouldn't work on a new execution. 但是我不能简单地将树中的每个节点写到文件中...在那里写的指针在新的执行中不起作用。 How implementation wise I even start to create an index file using a B+ tree? 我什至开始使用B +树创建索引文件时如何实现? Reminder that after creating it, index has to be loaded back into memory using the index file. 提醒您,创建索引后,必须使用索引文件将索引重新加载到内存中。

Translate your pointers to id's (should be a one-to-one correspondence between memory address of a node and a unique id). 将您的指针转换为ID(应该是节点的内存地址和唯一ID之间的一一对应关系)。 Then traverse your tree in postfix order or something writing the ids instead of the pointers. 然后以后缀顺序或编写id而不是指针的方式遍历树。

To reconstruct the tree, you simply read the data from file, and each new id you see, malloc some memory for it. 要重构树,您只需从文件中读取数据,然后看到的每个新ID将为其分配一些内存。 (If your ids are sequential from zero, then you can do this real quick and easy using a fixed size array for a hash table). (如果您的ID从零开始是连续的,那么您可以使用固定大小的哈希表数组来完成此操作,而且非常快捷)。

There are probably much clever-er ways to do this if you want to load the index real fast. 如果您想快速真正地加载索引,可能有许多更聪明的方法来执行此操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM