简体繁体 English

基于文档大小的MongoDB性能

[英]MongoDB Performance Based on Document Size

原文 2010-10-18 23:45:08 5 3 c#/ performance/ mongodb

I've been playing around with the samus mongodb driver , particularly the benchmark tests. 我一直在玩samus mongodb驱动程序，特别是基准测试。 From the output, it appears the size of the documents can have a drastic effect upon how long operations on those collections take. 从输出中可以看出，文档的大小会对这些集合的操作需要多长时间产生巨大影响。

替代文字

Is there some documentation available that recommends what balance to strive for or some more "real" numbers around what document size will do to query times? 是否有一些可用的文档可以推荐什么样的余额或者更多的“真实”数字围绕什么文档大小来查询时间？ Is this poor performance more a result of the driver and any serialization overhead? 这种糟糕的性能是否是驱动程序和任何序列化开销的结果？ Has anyone else noticed this? 有没有其他人注意到这一点？

3 个解决方案

I cannot find a link right now, but the format of the database is such that it should not matter if a document is large or small. 我现在找不到链接，但数据库的格式是这样的，无论文档是大还是小都无关紧要。 For access via index, there is certainly no difference, for a table scan, uninteresting documents (or uninteresting parts of documents) can be skipped quickly thanks to the BSON format. 对于通过索引进行访问，当然没有区别，对于表扫描，由于BSON格式，可以快速跳过不感兴趣的文档（或文档中不感兴趣的部分）。 If anything, the overhead of the BSON format affects tiny documents more than large ones . 如果有的话， BSON格式的开销会影响微文档而不是大文档。

So I would assume that the performance drop you see is largely due to the serialization costs of loading those documents (of course it takes more time to write a large document to disk than a small document, but it should be about the same for multiple small documents of the same aggregate size). 所以我认为你看到的性能下降很大程度上是由于加载这些文件的序列化成本（当然，将大型文档写入磁盘需要花费更多时间而不是小文档，但是对于多个小文档来说它应该大致相同具有相同聚合大小的文档）。

In your benchmark, can you normalize the numbers to be based on the same amount of data (in bytes, not in document count)? 在您的基准测试中，您是否可以将数据标准化为基于相同数据量（以字节为单位，而不是文档计数）？

You can turn on profiling with db.setProfilingLevel(2) and query db.system.profile for details on the executed queries. 您可以使用db.setProfilingLevel(2)打开分析，并查询db.system.profile以获取有关已执行查询的详细信息。

Although this may distort the test results a little, it will give you insight into the query times on the server , eliminating any influence the driver or network may have on the results. 虽然这可能会稍微扭曲测试结果，但它可以让您深入了解服务器上的查询时间 ，消除驱动程序或网络对结果的任何影响。 If these query times show the same pattern as your test, then the document size does influence query times. 如果这些查询时间显示与测试相同的模式，则文档大小确实会影响查询时间。 If query times are roughly the same regardless of document size, then it's serialization overhead you're looking at. 如果无论文档大小如何，查询时间大致相同，那么它就是您正在查看的序列化开销。

But is it a good benchmark? 但这是一个很好的基准吗？ Don't think so. 不要这么认为。 Read Mongodb performance on Windows . 在Windows上阅读Mongodb性能。

I think the exception that happens when the index should have been created is still swallowed. 我认为应该创建索引时发生的异常仍然被吞噬。 FindOne() medium return 363 with and without the "creation" of the index. FindOne（）medium返回363有和没有索引的“创建”。