简体繁体 English

速度，CouchDB视图和替代方案

[英]Speed, CouchDB views and alternatives

原文 2010-08-12 21:59:12 5 2 database/ performance/ couchdb

I have large data set, which I want to query. 我有大数据集，我想查询。 The query does not change but the underlying data does. 查询不会更改，但基础数据会更改。 From what I read, I could construct a "view" and query it. 从我读到的，我可以构建一个“视图”并查询它。 Also, I read that Couch DB knows how to update the view when data is changed so I assume querying the view again would be still fast. 另外，我读到Couch DB知道如何在数据更改时更新视图，因此我假设再次查询视图仍然很快。

My questions are, do I understand CounchDB's views correctly? 我的问题是，我是否正确理解了CounchDB的观点？ I don't need any other feature of CouchDB, I don't even need SQL, all I want is fast same query over changing data. 我不需要CouchDB的任何其他功能，我甚至不需要SQL，我想要的是对更改数据的快速相同查询。 Could I use something else? 我可以用别的东西吗？ If I would use, say, good old MySQL would it really be slower than CouchDB (read: in the above scenario, how would various DBs approximately perform?). 如果我会使用，比如说好老的MySQL它会真的比CouchDB慢（读：在上面的场景中，各种DB如何近似执行？）。

2 个解决方案

Your assessment is completely correct. 您的评估完全正确。 Enjoy! 请享用！

The only performance trick worth mentioning is that you may see a boost if you emit() all of the data you need from the view and never use the ?include_docs feature, because include_docs causes CouchDB to go back into the main database and retrieve the original doc that caused that view row. 唯一值得一提的性能技巧是，如果从视图中emit()所有数据并且从不使用?include_docs功能，则可能会看到提升，因为include_docs会导致CouchDB返回主数据库并检索原始数据导致该视图行的doc。 In other words, you can emit() everything you need into your view index (more space but faster), or you can use the reference back to the original document (less space but slower.) 换句话说，您可以在视图索引中emit()所有内容（更多空间但更快），或者您可以将引用用于原始文档（更少的空间但更慢）。

I don't think anyone can answer your question given the information you have provided. 鉴于您提供的信息，我认为没有人能回答您的问题。

Indexes in a relational database are analogous to CouchDB views. 关系数据库中的索引类似于CouchDB视图。 In both cases, they store a pre-sorted instance of the data and the database keeps that instance in sync with the canonical data. 在这两种情况下，它们都存储预先排序的数据实例，数据库使该实例与规范数据保持同步。 Both types of database transparently use the index/view to speed up subsequent queries of the form that the index/view was designed for. 两种类型的数据库透明地使用索引/视图来加速索引/视图所针对的表单的后续查询。

Without indexes/views, queries must scan the whole collection of n records of data and they execute in O(n) time. 如果没有索引/视图，查询必须扫描n数据记录的整个集合，并且它们在O(n)时间内执行。 When a query benefits from an indexes/views, it executes in O(log n) time. 当查询受益于索引/视图时，它将在O(log n)时间内执行。

But that's speaking very broadly of the performance curve with respect to the volume of data. 但这就是关于数据量的性能曲线非常广泛地说。 A given database could have such speedy performance in certain cases that it out-performs another product no matter what. 给定的数据库在某些情况下可以具有如此快速的性能，无论如何它都会超出另一种产品。 It's hard to make generalizations that brand X is always faster than brand Y. The only way to be sure about a specific case is to try that case in both databases and measure the performance. 很难概括出品牌X总是比品牌Y更快。确定特定案例的唯一方法是在两个数据库中尝试这种情况并衡量绩效。