简体繁体 English

如何为文件系统实现B +树？

[英]How to implement B+ Tree for file systems?

原文 2010-04-09 06:19:20 5 2 c/ algorithm/ b-tree

I have a text file which contains some info on extents about all the files in the file system, like below C:\\Program Files\\abcd.txt 12345 100 23456 200 C:\\Program Files\\bcde.txt 56789 50 26746 300 ... 我有一个文本文件，其中包含有关文件系统中所有文件范围的一些信息，例如C：\\ Program Files \\ abcd.txt 12345 100 23456 200 C：\\ Program Files \\ bcde.txt 56789 50 26746 300 .. 。

Now i have another binary which tries to find out about extents for all the files. 现在我有另一个二进制文件，试图找出所有文件的范围。 Now currently i am using linear search to find extent info for the files in the above mentioned text file. 现在，我目前正在使用线性搜索来查找上述文本文件中文件的范围信息。 This is a time consuming process. 这是一个耗时的过程。 Is there a better way of coding this ? 有更好的编码方式吗？ Like Implementing any good data structure like BTree. 就像实现任何良好的数据结构（如BTree）一样。 If B+ Tree is used what is the key, branch factor i need to use ? 如果使用B +树，关键是什么，我需要使用分支因子？

2 个解决方案

Use a database. 使用数据库。

The key points in implementing a tree in a file are to have fixed record lengths and to use file offsets instead of pointers. 在文件中实现树的关键点是具有固定的记录长度，并使用文件偏移量而不是指针。

Use a database. 使用数据库。 Hmmm, SQL Lite . 嗯， SQL Lite 。

Another point to consider with files is that reading in chunks of data is faster than reading individual items (regardless of whether or not the hard disk has a cache or the OS has a cache). 与文件一起考虑的另一点是，读取数据块要比读取单个项目快（无论硬盘是否具有缓存或操作系统具有缓存）。 I implemented a B+Tree, which uses pages as it's nodes. 我实现了一个B + Tree，它使用页面作为节点。

Use a database . 使用数据库 。 Databases have already been written and tested . 数据库已经被编写和测试 。

A more efficient design is to keep the initial node in memory. 一种更有效的设计是将初始节点保留在内存中。 This reduces the number of fetches from the file. 这减少了从文件中提取的次数。 If your program has the space, keeping the first couple of levels in memory may also speed up execution. 如果您的程序有空间，则将前几个级别保留在内存中也可以加快执行速度。

Use a database. 使用数据库。

I gave up writing a B-Tree implementation for my application because I wanted to concentrate on the other functionality of the program. 我放弃为应用程序编写B-Tree实现，因为我想专注于程序的其他功能。 I later learned that in the real world (the world where programs need to be finished on a schedule) that time should be spent on the 'core' of the application rather than accessories that have already been written and tested (aka off-the-shelf). 后来我了解到，在现实世界（需要按计划完成程序的世界）中，应该将时间花在应用程序的“核心”上，而不是已经编写和测试过的附件（也就是现成的-架）。

It depends on how do you want to search your file. 这取决于您要如何搜索文件。 I assume that you want to look up your info given a file name. 我假设您要查找给定文件名的信息。 Then a hash table or a Trie would be a good data structure to use. 那么哈希表或Trie将是一个很好的数据结构。

The B-tree is possible but not the most convenient choice given that your keys are strings. 鉴于您的密钥是字符串，可以使用B树，但不是最方便的选择。