简体繁体 English

为什么DJ Bernstein CDB（常数据库）使用256个哈希表？

[英]Why DJ Bernstein CDB (constant database) uses 256 hashtables?

原文 2014-11-10 19:05:28 2 1 cdb

Why DJB CDB (constant database) was designed to use 256 hashtables? 为什么DJB CDB（常量数据库）被设计为使用256个哈希表？

Why not single bigger 252 * 256 hashtable? 为什么不单个更大的252 * 256哈希表？

Is it only to save space or there are some other reason? 它只是为了节省空间还是有其他原因？

1 个解决方案

DJB CDB uses two levels of hash tables. DJB CDB使用两级哈希表。 The first table is fixed size 2K at the beginning of the file. 第一个表在文件开头是固定大小2K。 The second set of tables are at the end of the file, and is built in memory as data is being streamed into the cdb. 第二组表位于文件的末尾，并且在数据流入cdb时内置在内存中。 Once all data has been streamed into the cdb, the second set of hash tables are streamed out to disk, and then the first table (at the beginning of the file) is populated with the offsets to each of the tables in the second set. 一旦所有数据都流入cdb，第二组哈希表将流式传输到磁盘，然后第一个表（在文件的开头）填充第二组中每个表的偏移量。

In other words, the multi-level hash tables allow streaming creation of the cdb with the simple exception of writing the beginning 2K of the file at the end of the cdb creation. 换句话说，多级散列表允许流式创建cdb，除了在cdb创建结束时写入文件的开头2K这一简单例外。

Access to the cdb is fast, hitting the first table (2K at beginning of the file) to find the offset of the second table (among the second set of tables) at the end of the cdb file, which provides the location of the data in the cdb. 访问cdb很快，点击第一个表（文件开头2K），找到cdb文件末尾的第二个表（在第二组表中）的偏移量，它提供了数据的位置在cdb中。

Further information can be found in the NOTES at https://github.com/gstrauss/mcdb/ which is a rewrite of DJB's venerable cdb. 更多信息可以在https://github.com/gstrauss/mcdb/上的NOTES中找到，这是DJB古老的cdb的重写。 mcdb is faster than cdb and removes the 4GB cdb limitation, among other benefits. mcdb比cdb快，除去了4GB的cdb限制，以及其他好处。