简体   繁体   English

C / C ++:如何在B树中的文件中存储数据

[英]C/C++: How to store data in a file in B tree

It appears to me that one way of storing data in a B-tree as a file can be done efficiently with C using binary file with a sequence (array) of structs, with each struct representing a node. 在我看来,将数据作为文件存储在B树中的一种方法可以使用具有结构序列(数组)的二进制文件来有效地完成,每个结构代表一个节点。 One can thus connect the individual nodes with approach that will be similar to creating linked lists using arrays. 因此,可以使用与使用数组创建链接列表类似的方法来连接各个节点。 But then the problem that props up would be deletion of a node, as erasing only a few bytes in the middle in a huge file is not possible. 但后来支持的问题是删除一个节点,因为在一个巨大的文件中只删除中间的几个字节是不可能的。

One way of deleting could be to keep track of 'empty' nodes until a threshold cutoff is reached and then make another file that will discard the empty nodes. 一种删除方法可以是跟踪“空”节点,直到达到阈值截止,然后制作另一个将丢弃空节点的文件。 But this is tedious. 但这很乏味。

Is there a better approach from the simplicity/efficiency point of view for deleting, or even representing a B-tree in a file? 从简单/效率的角度来看,删除甚至表示文件中的B树是否有更好的方法?

TIA, -Sviiya TIA,-Sviiya

For implementing B-Trees in a file, you can use the file offset instead of pointers. 要在文件中实现B-Tree,可以使用文件偏移而不是指针。 Also, you can implement a "file memory manager", so that you can re-use deleted items in the file. 此外,您可以实现“文件内存管理器”,以便您可以在文件中重复使用已删除的项目。

In order to fully recover the deleted blocks in a B-Tree file, you will have to recreate the B-Tree in a new file. 为了完全恢复B树文件中的已删除块,您必须在新文件中重新创建B树。 Also remember the most OSes have no methods for truncating files. 还记得大多数操作系统没有截断文件的方法。 A portable method for truncating a file is to write a new file and destroy the old. 截断文件的可移植方法是写入新文件并销毁旧文件。

Another suggestion is to partition the file into B-Tree partition and data (item) partition. 另一个建议是将文件分区为B-Tree分区和数据(项)分区。 A B-Tree partition will contain the pages. B树分区将包含页面。 The leaf pages will contain offsets to the data items. 叶页面将包含数据项的偏移量。 The data partition will be a section in the file containing data items. 数据分区将是包含数据项的文件中的一个部分。 You may end up creating more than one of each partition and the partitions may be interleaved. 您最终可能会创建多个分区,并且分区可能会交错。

I spent much time playing with a file based B-Tree, until I gave up and decided to let a database program (or server) handle the data for me. 我花了很多时间玩基于文件的B-Tree,直到我放弃并决定让数据库程序(或服务器)为我处理数据。

I did a very quick search and dug up this: http://people.csail.mit.edu/jaffer/WB C source: http://cvs.savannah.gnu.org/viewvc/wb/wb/c/ - it seems to offer disk-based B-tree style databases - although taking a look at "delete.c" it seemed to imply if you delete a node everything down from it would be taken out - if that's the correct behaviour then it sounds like something that might help? 我做了一个非常快速的搜索并挖出了这个: http//people.csail.mit.edu/jaffer/WB C来源: http//cvs.savannah.gnu.org/viewvc/wb/wb/c/ -它似乎提供基于磁盘的B树样式数据库 - 虽然看看“delete.c”它似乎意味着如果你删除一个节点将从中删除所有内容 - 如果这是正确的行为,那么它听起来像什么可能有帮助?

Also - B-trees are often used in filesystems - could you not take a look at some filesystem code? 另外 - B树通常用在文件系统中 - 你能不看一些文件系统代码?

My own inclination is that of a file-system - if you have a B-tree of fixed-size, whenever you "delete" a node rather than attempting to remove the reference, just set the value to whatever means nothing in your code. 我自己的倾向是文件系统的倾向 - 如果你有一个固定大小的B树,每当你“删除”一个节点而不是试图删除引用时,只需将值设置为代码中没有任何意义。 Then, have a clean-up thread running that checks if anyone has the file open for reading and if all's quiet blocks the file and tidies up. 然后,运行一个清理线程,检查是否有人打开文件进行读取,如果所有人都安静地阻止文件并整理。

You can use Berkley DB as well. 您也可以使用Berkley DB。 It works well with C programs and implements B+ tree. 它适用于C程序并实现B +树。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM