简体繁体 English

cdb - 大文件的常量键值存储（数百GB）

[英]cdb - constant key-value store for large files (hundreds of GB)

原文 2012-03-15 16:48:06 9 3 database/ key-value/ key-value-store/ cdb

I need a tool similar to cdb (constant database) that would allow me to store large sets of data (in the range of hundreds of gigabytes) in indexed files. 我需要一个类似于cdb（常量数据库）的工具，它允许我在索引文件中存储大量数据集（在数百GB的范围内）。 CDB is an ideal candidate but it has a 2 GB file size limit so it's not suitable. CDB是理想的候选者，但它有2 GB的文件大小限制，所以它不适合。 The functionality I'm looking for is a persistent key-value store supporting binary keys and values. 我正在寻找的功能是支持二进制密钥和值的持久键值存储。 After creating the database is read only and will never be modified. 创建数据库后，它是只读的，永远不会被修改。 Can you recommend some tool? 你能推荐一些工具吗？ And btw, storage overhead should be small because I will be storing billion of records. 顺便说一句，存储开销应该很小，因为我将存储数十亿条记录。

BTW I'm looking for a db management library (embeddable), not a standalone server. 顺便说一下，我正在寻找一个数据库管理库（嵌入式），而不是一个独立的服务器。 Something that can be used inside a C program. 可以在C程序中使用的东西。

Thanks, RG 谢谢，RG

3 个解决方案

Another option is mcdb, which is extended from Dan J. Bernstein's cdb. 另一种选择是mcdb，它是从Dan J. Bernstein的cdb扩展而来的。

https://github.com/gstrauss/mcdb/ https://github.com/gstrauss/mcdb/

mcdb supports very large constant databases and is faster than cdb, both for database creation and database access. mcdb支持非常大的常量数据库，并且比数据库创建和数据库访问都快。 Still, creating a database of hundreds of gigabytes can take a bit of time. 尽管如此，创建一个数百GB的数据库可能需要一些时间。 mcdb can create a gigabyte-sized database in a few seconds for cached data or in a minute or so when starting from cold cache. mcdb可以在几秒钟内为缓存数据创建一个千兆字节大小的数据库，或者在从冷缓存启动时在一分钟左右创建一个千兆字节大小的数据库。

https://github.com/gstrauss/mcdb/blob/master/t/PERFORMANCE https://github.com/gstrauss/mcdb/blob/master/t/PERFORMANCE

(Disclosure: I am the author of mcdb) （披露：我是mcdb的作者）

如果你的价值很大且钥匙很小，你可以考虑redis以及http://redis.io

There's hamsterdb (i'm the author), berkeleydb, tokyo cabinet. 有hamsterdb（我是作者），berkeleydb，东京内阁。

hamsterdb uses a btree and therefore sorts your data. hamsterdb使用btree，因此对数据进行排序。 tokyo cabinet is a hash table and therefore not sorted. tokyo cabinet是一个哈希表，因此没有排序。 berkeleydb can do both. berkeleydb可以做到这两点。

Needless to say what I would recommend ;) 不用说我会推荐什么;）

All of them can be linked into a C application. 所有这些都可以链接到C应用程序。 None of them should have a 2GB limit. 他们都不应该有2GB的限制。

bye Christoph 克里斯托夫再见