简体   繁体   English

当数据无法适应内存时,mongoDB与关系数据库相比?

[英]mongoDB vs relational databases when data can't fit into memory?

First of all, I apologize for my potentially shallow understanding of NoSQL architecture (and databases in general) so try to bear with me. 首先,我为我对NoSQL架构(以及一般数据库)的潜在浅薄理解深表歉意,所以请耐心等待。

I'm thinking of using mongoDB to store resources associated with an UUID. 我正在考虑使用mongoDB来存储与UUID相关的资源。 The resources can be things such as large image files (tens of megabytes) so it makes sense to store them as files and store just links in my database along with the associated metadata. 资源可以是诸如大图像文件(几十兆字节)之类的东西,因此将它们存储为文件并仅在我的数据库中存储链接以及相关元数据是有意义的。 There's also the added flexibility of decoupling the actual location of the resource files, so I can use a different third party to store the files if I need to. 还有增加灵活性来解耦资源文件的实际位置,因此如果需要,我可以使用不同的第三方来存储文件。

Now, one document which describes resources would be about 1kB. 现在,一个描述资源的文档大约是1kB。 At first I except a couple hundred thousands of resource documents which would equal some hundreds of megabytes in database size, easily fitting into server memory. 起初我除了几十万个数据库大小相当于几百兆字节的资源文档,很容易适应服务器内存。 But in the future I might have to scale this into the order of tens of MILLIONS of documents. 但是将来我可能需要将其扩展到数十万个文档的数量级。 This would be tens of gigabytes which I can't squeeze into server memory anymore. 这将是几十千兆字节,我不能再挤进服务器内存了。

Only the index could still fit in memory being around a gigabyte or two. 只有索引仍然适合内存大约一千兆字节或两千兆字节。 But if I understand correctly, I'd have to read from disk every time I did a lookup on an UUID. 但是,如果我理解正确,每次我在UUID上查找时都必须从磁盘读取。 Is there a substantial speed benefit from mongoDB over a traditional relational database in such a situation? 在这种情况下,传统的关系数据库是否可以从mongoDB获得显着的速度优势?

BONUS QUESTION: is there an existing, established way of doing what I'm trying to achieve? 奖金问题:有没有一种既定的方式来做我想要实现的目标? :) :)

MongoDB doesn't suddenly become slow the second the entire database no longer fits into physical memory. MongoDB在第二个整个数据库不再适合物理内存时不会突然变慢。 MongoDB currently uses a storage engine based on memory mapped files. MongoDB目前使用基于内存映射文件的存储引擎。 This means data that is accessed often will usually be in memory (OS managed, but assume a LRU scheme or something similar). 这意味着经常访问的数据通常会在内存中(操作系统受管理,但假设LRU方案或类似的东西)。

As such it may not slow down at all at that point or only slightly, it really depends on your data access patterns. 因此,它可能在此时或根本没有减速,这实际上取决于您的数据访问模式。 Similar story with indexes, if you (right) balance your index appropriately and if your use case allows it you can have a huge index with only a fraction of it in physical memory and still have very decent performance with the majority of index hits happening in physical memory. 与索引类似的故事,如果你(右)适当地平衡你的索引,如果你的用例允许它你可以有一个巨大的索引只有一小部分在物理内存中仍然有非常好的性能与大多数索引点击发生在物理内存。

Because you're talking about UUID's this might all be a bit hard to achieve since there's no guarantee that the same limited group of users are generating the vast majority of throughput. 因为你在谈论UUID,所以这可能有点难以实现,因为不能保证同一组有限的用户正在产生绝大部分的吞吐量。 In those cases sharding really is the most appropriate way to maintain quality of service. 在这些情况下,分片确实是维持服务质量的最合适方式。

  This would be tens of gigabytes which I can't squeeze into server 

memory anymore. 记忆了。

That's why MongoDB gives you sharding to partition your data across multiple mongod instances (or replica sets). 这就是为什么MongoDB为您提供分片以跨多个mongod实例(或副本集)划分数据的原因。

In addition to considering sharding, or maybe even before, you should also try to use covered indexes as much as possible, especially if it fits your Use cases. 除了考虑分片,或者甚至在考虑之前,您还应该尝试尽可能多地使用覆盖索引,特别是如果它适合您的用例。

This way you do not HAVE to load entire documents into memory. 这样您就不必将整个文档加载到内存中。 Your indexes can help out. 你的索引可以提供帮助。

http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields#RetrievingaSubsetofFields-CoveredIndexes http://www.mongodb.org/display/DOCS/Retrieving+a+Subset+of+Fields#RetrievingaSubsetofFields-CoveredIndexes

If you have to display your entire document all the time based on the id, then the general rule of thumb is to attempt to keep e working set in memory. 如果您必须始终根据ID显示整个文档,那么一般的经验法则是尝试将e工作集保留在内存中。

http://blog.boxedice.com/2010/12/13/mongodb-monitoring-keep-in-it-ram/ http://blog.boxedice.com/2010/12/13/mongodb-monitoring-keep-in-it-ram/

This is one of the resources that talks about that. 这是谈论这一点的资源之一。 There is a video on mongodb's site too that speaks about this. mongodb的网站上也有一个视频说明了这一点。

By attempting to size the ram so that the working set is in memory, and also looking at sharding, you will not have to do this right away, you can always add sharding later. 通过尝试调整ram的大小以使工作集在内存中,并且还在查看分片,您不必立即执行此操作,以后可以随时添加分片。 This will improve scalability of your app over time. 这将提高您的应用程序的可扩展性。

Again, these are not absolute statements, these are general guidelines, that you should think through your usage patterns and make sure that they ar relevant to what you are doing. 同样,这些不是绝对的陈述,这些是一般性的指导方针,你应该考虑你的使用模式,并确保它们与你正在做的事情相关。

Personally, I have not had the need to fit everything in ram. 就个人而言,我没有必要把所有东西都装进公羊。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM