简体   繁体   English

我应该使用哪个数据库来存储记录,我应该如何使用它?

[英]Which database should I use to store records, and how should I use it?

I'm developing an application that will store a sizeable number of records. 我正在开发一个存储大量记录的应用程序。 These records will be something like (URL, date, title, source, {optional data...}) 这些记录将类似于(URL,日期,标题,来源,{可选数据...})

As this is a client-side app, I don't want to use a database server, I just want the info stored into files. 由于这是一个客户端应用程序,我不想使用数据库服务器,我只想将信息存储到文件中。

I want the files to be readable from various languages (at least python and C++), so something language specific like python's pickle is out of the game. 我希望这些文件可以从各种语言中读取(至少是python和C ++),所以像python的pickle这样的语言特有的东西不在游戏中。

I am seeing two possibilities: sqlite and BerkeleyDB. 我看到两种可能性:sqlite和BerkeleyDB。 As my use case is clearly not relational, I am tempted to go with BerkeleyDB, however I don't really know how I should use it to store my records, as it only stores key/value pairs. 由于我的用例显然不是关系型的,我很想和BerkeleyDB一起使用,但我真的不知道如何使用它来存储我的记录,因为它只存储键/值对。

Is my reasoning correct? 我的推理是否正确? If so, how should I use BDB to store my records? 如果是这样,我应该如何使用BDB存储我的记录? Can you link me to relevant info? 你能把我链接到相关信息吗? Or am I missing a better solution? 或者我错过了更好的解决方案?

I am seeing two possibilities: sqlite and BerkeleyDB. 我看到两种可能性:sqlite和BerkeleyDB。 As my use case is clearly not relational, I am tempted to go with BerkeleyDB, however I don't really know how I should use it to store my records, as it only stores key/value pairs. 由于我的用例显然不是关系型的,我很想和BerkeleyDB一起使用,但我真的不知道如何使用它来存储我的记录,因为它只存储键/值对。

What you are describing is exactly what relational is about, even if you only need one table. 你所描述的正是关系的意义所在,即使你只需要一张桌子。 SQLite will probably make this very easy to do. SQLite可能会让这很容易。

EDIT: The relational model doesn't have anything to do with relationships between tables. 编辑:关系模型与表之间的关系没有任何关系。 A relation is a subset of the Cartesian product of other sets. 关系是其他集合的笛卡尔积的子集。 For instance, the cartesian product of the Real numbers, Real Numbers, and Real numbers (Yes, all three the same) produce 3d coordinate space, and you could define a relation upon that space with a formula, say x*y = z . 例如,实数,实数和实数的数据(是的,所有三个相同)产生3d坐标空间,您可以使用公式定义该空间的关系,例如x*y = z each possible set of coordinates (x0,y0,z0) are either in the relation if they satisfy the given formula, or else they are not. 每个可能的坐标集(x0,y0,z0)如果满足给定的公式则处于关系中,否则它们不是。

A relational database uses this concept with a few additional requirements. 关系数据库使用此概念以及一些额外要求。 First, and most important, the size of the relation must be finite. 首先,最重要的是,关系的大小必须是有限的。 The product relation given above doesn't satisfy that requirement, because there are infinitely many 3-tuples that satisfy the formula. 上面给出的产品关系不满足该要求,因为有无限多的3元组满足公式。 There are a number of other considerations that have more to do with what is practical or useful on real computers solving real problems. 还有许多其他考虑因素与实际计算机解决实际问题的实际或有用之处有关。

A better way of thinking about the problem is to think about where each type of persistence mechanism specifically works better than the other. 思考问题的一种更好的方法是考虑每种类型的持久性机制在哪些方面比另一种更好。 You already recognize that a relational solution makes sense when you have many separate datasets (tables) that must support relationships between them (foreign key constraints), which is almost impossible to enforce with a key-value store. 当您有许多必须支持它们之间的关系(外键约束)的单独数据集(表)时,您已经认识到关系解决方案是有意义的,这几乎不可能通过键值存储实施。 Another real advantage to relational is the way it makes rich, ad-hoc queries possible with the use of proper indexes. 关系的另一个真正优势是它可以通过使用适当的索引来实现丰富的即席查询。 This is a consequence of the database layer actually understanding the data that it is representing. 这是数据库层实际理解它所代表的数据的结果。

A key-value store has it's own set of advantages. 键值商店拥有自己的一系列优势。 One of the more important is the way that key-value stores scale out. 其中一个更重要的是键值存储扩展的方式。 It is no consequence that memcached , couchdb , hadoop all use key-value storage, because it is easy to distribute key-value lookup across multiple servers. memcachedcouchdbhadoop都使用键值存储是没有意义的,因为很容易在多个服务器上分发键值查找。 Another area that key-value storage works well is when the key or value is opaque, such as when the stored item is encrypted, only to be readable by it's owner. 键值存储运行良好的另一个领域是当键或值不透明时,例如当存储的项目被加密时,只有它的所有者可读。


To drive this point home, that a Relational database works well even when you just don't need more than one table, consider the following (not original) 要将这一点推向家庭,即使您不需要多个表,关系数据库也能正常工作,请考虑以下内容(非原创)

SELECT t1.actor1 
FROM workswith AS t1, 
     workswith AS t2, 
     workswith AS t3, 
     workswith AS t4, 
     workswith AS t5,
     workswith AS t6
WHERE t1.actor2 = t2.actor1 AND
      t2.actor2 = t3.actor1 AND
      t3.actor2 = t4.actor1 AND
      t4.actor2 = t5.actor1 AND
      t5.actor2 = t6.actor1 AND
      t6.actor2 = "Kevin Bacon";

Which, obviously uses a single table: workswith to compute every actor with a bacon number of 6 其中,显然只使用一个表: workswith计算每个演员具有6:1培根数

BerkeleyDB is good, also look at the *DBM incarnations (eg GDBM). BerkeleyDB很好,也看看* DBM化身(例如GDBM)。 The big question though is: for what do you need to search? 但最大的问题是:你需要搜索什么? Do you need to search by that URL, by a range of URLs or the dates you list? 您是否需要按该网址,一系列网址或您列出的日期进行搜索?

It is also quite possible to keep groups of records as simple files in the local filesystem, grouped by dates or search terms, &c. 也可以将记录组保存为本地文件系统中的简单文件,按日期或搜索条件分组,&c。

Answering the "search" question is the biggest start. 回答“搜索”问题是最大的开端。

As for the key/value thingy, what you need to ensure is that the KEY itself is well defined as for your lookups. 至于key / value thingy,你需要确保的是KEY本身已经很好地定义了你的查找。 If for example you need to lookup by dates sometimes and others by title, you will need to maintain a "record" row, and then possibly 2 or more "index" rows making reference to the original record. 例如,如果您需要按日期按日期查找,而其他按标题查找,则需要维护“记录”行,然后可能需要2个或更多“索引”行来引用原始记录。 You can model nearly anything in a key/value store. 您可以在键/值存储中建模几乎任何内容。

Personally I would use sqlite anyway. 就个人而言,无论如何我会使用sqlite。 It has always just worked for me (and for others I work with). 它一直为我(以及我合作的其他人)工作过。 When your app grows and you suddenly do want to do something a little more sophisticated, you won't have to rewrite. 当您的应用程序增长并且您突然想要做一些更复杂的事情时,您将不必重写。

On the other hand, I've seen various comments on the Python dev list about Berkely DB that suggest it's less than wonderful; 另一方面,我在Python开发人员列表中看到过关于Berkely DB的各种评论,这些评论表明它并不精彩; you only get dict-style access (what if you want to select certain date ranges or titles instead of URLs); 你只能获得dict风格的访问权限(如果你想选择某些日期范围或标题而不是URL,该怎么办); and it's not even in Python 3's standard set of libraries. 它甚至不是Python 3的标准库集。

What about MongoDB ? 那么MongoDB呢? I haven't tried it yet, but it seems interesting. 我还没试过,但看起来很有趣。

If you're only going to use a single field to look up records, a simple key-value store would be a good choice. 如果您只想使用单个字段来查找记录,那么简单的键值存储将是一个不错的选择。 Store that single field (or any other unique ID) as your key, serialize each record as a string (using JSON or similar), and store that string as the value. 将该单个字段(或任何其他唯一ID)存储为您的密钥,将每个记录序列化为字符串(使用JSON或类似字符串),并将该字符串存储为值。 Berkeley DB is certainly a reasonable choice for a key-value store, but there are many alternatives to choose from: http://en.wikipedia.org/wiki/Dbm Berkeley DB无疑是键值商店的合理选择,但有很多选择可供选择: http//en.wikipedia.org/wiki/Dbm

If you want to look up records by any of several fields, SQLite might be easiest for development purposes. 如果您想通过几个字段中的任何一个查找记录,SQLite可能最容易用于开发目的。 You'll be writing queries in SQL but you won't have to maintain a database server. 您将在SQL中编写查询,但您不必维护数据库服务器。 All the multi-key machinery is already written for you. 所有的多功能机器都已经为您编写。

If you really want to avoid SQL or squeeze every bit of performance out of your data store, and you want multi-key access, consider a layer of extra logic on top of a key-value store. 如果您真的想要避免SQL或从数据存储中挤出所有性能, 并且您想要多键访问,请考虑在键值存储之上添加一层额外逻辑。 It is possible to build column-like behavior on top of key-value stores by serializing your records and inserting the "column" values of each record as additional keys whose values contain the "primary" key of your record. 通过序列化记录并将每个记录的“列”值插入其值包含记录的“主”键的附加键,可以在键值存储之上构建类似行的行为。 (You're effectively using the key-value store as both a dictionary of records and a dictionary of indexes for finding those records.) Google's App Engine does something like this. (您实际上将键值存储用作记录字典和索引字典以查找这些记录。)Google的App Engine就是这样做的。 You can do this yourself or use one of various document-oriented databases that will do it for you. 您可以自己执行此操作,也可以使用各种面向文档的数据库中的一种来为您执行此操作。 For some interesting reading, try googling "nosql". 对于一些有趣的阅读,尝试谷歌搜索“nosql”。 http://www.google.com/search?&q=nosql http://www.google.com/search?&q=nosql

Ok, so you say just storing the data..? 好的,所以你说只是存储数据..? You really only need a DB for retrieval, lookup, summarising, etc. So, for storing, just use simple text files and append lines. 你真的只需要一个DB来检索,查找,总结等等。因此,对于存储,只需使用简单的文本文件和追加行。 Compress the data if you need to, use delims between fields - just about any language will be able to read such files. 如果需要,压缩数据,在字段之间使用delim - 几乎任何语言都能读取这些文件。 If you do want to retrieve, then focus on your retrieval needs, by date, by key, which keys, etc. If you want simple client side, then you need simple client db. 如果您确实想要检索,那么请关注您的检索需求,按日期,按键,哪些键等。如果您想要简单的客户端,那么您需要简单的客户端数据库。 SQLite is far easier than BDB, but look at things like Sybase Advantage (very fast and free for local clients but not open-source) or VistaDB or firebird... but all will require local config/setup/maintenance. SQLite比BDB容易得多,但是看看像Sybase Advantage这样的东西(非常快速且对本地客户端而言是免费的,但不是开源的)或VistaDB或firebird ......但是所有这些都需要本地配置/设置/维护。 If you go local XML for a 'sizable' number of records will give you some unnecessarily bloated file-sizes..! 如果您使用本地XML获取“相当大”的记录数量,则会为您提供一些不必要的文件大小......!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM