简体繁体 English

键值存储是否适合需要“全部获取”的用法？

[英]Is a key-value store appropriate for a usage that requires “get all”?

原文 2015-02-12 04:47:00 1 2 mongodb/ redis/ key-value-store/ document-store/ nosql

Based on what research I've done, I suspect that a key-value store is NOT the way to go, but I wanted to get more directed input to: 基于我所做的研究，我怀疑键值存储不是可行的方法，但我希望获得更多的定向输入：

Determine if a key value store is even a viable solution for my usage. 确定密钥值存储是否是我使用的可行解决方案。
Be able to articulate my reasons for preferring a document store instead. 能够阐明我偏爱文档存储的原因。

To explain my use case 解释我的用例

I have an application that consists of many "documents". 我有一个由许多“文档”组成的应用程序。 These are currently being stored in a sort of CMIS repository. 这些目前存储在一种CMIS存储库中。 The application, however, only ever interacts with these documents after they've been indexed into elasticsearch. 但是，应用程序只有在将这些文档编入索引后才会与这些文档进行交互。 This means that ALL read operations will hit elasticsearch, and all write operations will update both elasticsearch and the repository. 这意味着所有读取操作都将命中elasticsearch，而所有写入操作将同时更新elasticsearch和存储库。

Requested features have revealed that the current repository is much too strict and that there's zero reason to enforce a model schema at that level. 所需的功能表明，当前的存储库过于严格，在该级别强制执行模型架构的理由为零。 This, of course, has led to an investigation in NoSQL options. 当然，这导致了对NoSQL选项的调查。

In order to populate these "documents" into the elasticsearch index, they need to live somewhere and I must be able to get all and paginate through them as they load into the index (there's also some aggregation that occurs at this step in order to populate fields that are built off of existing fields). 为了将这些“文档”填充到elasticsearch索引中，它们需要存在于某个地方，并且我必须能够在加载到索引时获取所有内容并对它们进行分页（在此步骤中还会发生一些聚合以便填充从现有字段构建的字段）。

Right now, the get all is actually being done in stages based on the type of document, but this requirement may be negotiable and instead a plain get all of all types could suffice but would not be ideal. 目前， 全部获取实际上是根据文档的类型分阶段完成的，但是此要求可以协商，相反，所有类型的简单获取就足够了，但并不理想。

In my understanding of key-value stores, the store knows nothing about the values it stores, and they can only be referenced by a key. 在我对键值存储的理解中，商店对它存储的值一无所知，它们只能通过键引用。 This causes me to wonder if I could even perform a get all when I don't plan on maintaining a full list of the keys anywhere. 这让我想知道如果我不计划在任何地方维护完整的密钥列表，我是否甚至可以执行全部操作。 I've seen that some key-value stores support using dictionaries as the key (redis). 我已经看到一些键值存储支持使用字典作为键（redis）。 I'm not sure if this means I could query by type (if it were an entry in the dictionary) or if I would need to know the full dictionary to be able to fetch the value? 我不确定这是否意味着我可以按类型查询（如果它是字典中的一项），还是我需要知道完整的字典才能获取值？

Since the population of the index should only need to happen if there was an elasticsearch failure, performance is not my top priority (but it certainly would not hurt). 由于索引的人口只需要在弹性搜索失败的情况下发生，因此性能不是我的首要任务（但肯定不会受到伤害）。 To me, MongoDB seems to be a near perfect fit. 对我来说，MongoDB似乎是一个近乎完美的契合。 I can store documents and easily query by type. 我可以存储文档并轻松按类型查询。

Given my use case, does a document store seem like a good decision? 鉴于我的用例，文档存储看起来是一个好的决定吗？
Could this also be solved reasonably by a key-value store? 是否可以通过键值存储合理地解决此问题？
Are there any other advantages to using one over another? 使用一个在另一个上有其他任何好处吗？

In case it matters, for document stores I've been comparing CouchDB, Couchbase, and MongoDB. 万一重要，对于文档存储，我一直在比较CouchDB，Couchbase和MongoDB。 For key-value stores I've been looking at Redis and BerkeleyDB. 对于键值存储，我一直在研究Redis和BerkeleyDB。

2 个解决方案

AFAIK, Redis does not allow using a dictionary as a key, except when using the sort function by external keys. AFAIK，Redis不允许使用字典作为键，除非通过外部键使用排序功能。 For your use case, using Redis would imply you to maintain a List of all documents, and/or a list by document type. 对于您的用例，使用Redis意味着您维护所有文档的列表和/或按文档类型的列表。 Although it's absolutely possible, and rather simple, I don't really see any interest in using Redis there. 虽然这绝对可能，而且相当简单，但我并没有真正感兴趣在那里使用Redis。 Redis shines when you need high performances. 当你需要高性能时，Redis会发光。 This is not a requirement for you, so you'd better use a document DB instead. 这不是您的要求，因此您最好使用文档数据库。

In Redis you can get all the keys and values, with a bit of work and the following commands: 在Redis中，您可以通过一些工作和以下命令来获取所有键和值：

SCAN: to get all the keys 扫描：获取所有密钥
TYPE: determine the type of the key (string, hash, list etc) TYPE：确定键的类型（字符串，哈希，列表等）
MGET/HGETALL/etc: to get the actual values, depending on which structure the key refers to. MGET / HGETALL / etc：获取实际值，具体取决于键所指的结构。

The SCAN command is also conveniently implemented to dump everything in 'redis-cli --scan', as well as in many client libraries (eg Python). SCAN命令还可以方便地实现以将所有内容转储到'redis-cli --scan'以及许多客户端库（例如Python）中。

You might need to write something to get this to work for your particular scenario, hopefully shouldn't be too difficult. 您可能需要编写一些内容以使其适用于您的特定方案，希望不应该太困难。

NB: there is a KEYS command (which does similar thing to SCAN) which is not recommended for live production use. 注意：有一个KEYS命令（它与SCAN类似），不建议用于实时生产。 Although nothing stops you to build a separate independent slave instance, replicate from the master, disconnect from master, and then use the slave as you wish without any impact on anything serving live traffic. 尽管没有什么可以阻止您构建单独的独立从属实例，但可以与主服务器进行复制，与主服务器断开连接，然后根据需要使用从属服务器，而不会影响为实时流量提供服务的任何组件。