简体繁体 English

MongoDB：单独的集合用于读写，以实现高性能

[英]MongoDB: separate collections for read and write for high performance

原文 2014-01-09 09:33:48 5 2 mongodb/ database-design

I use mongodb and want to design the database to meet high scalability requirements. 我使用mongodb，并希望设计数据库以满足较高的可伸缩性要求。 Currently, let's say a collection A is heavily used for read and writes. 目前，假设集合A被大量用于读写操作。 The writes would imply a lock (database lock now, hopefully collection lock in the future releases), locking out the read operations. 写入将暗示一个锁（现在为数据库锁，希望在将来的发行版中为集合锁），从而锁定读取操作。

My idea is to duplicate A into A and A-tmp, where both have the same schema. 我的想法是将A复制到A和A-tmp，这两个都有相同的架构。 A holds all data while A-tmp is initially empty. A保留所有数据，而A-tmp最初为空。 New entries gets inserted into A-tmp. 新条目将插入A-tmp。 Using a cronjob entries from A-tmp are periodically moved to A. When the application tries to lookup data after write in will look in A, and if it data is not found subsequently look in A-tmp. 使用来自A-tmp的cronjob条目定期移至A。在写入后应用程序尝试查找数据时，将查找A，如果随后未找到数据，则查找A-tmp。 Thus, A-tmp is mainly used for writes and occasionally read from when entries are not found in A. A is mainly used for reads and periodically written to from A-tmp. 因此，当在A中找不到条目时，A-tmp主要用于写入和偶尔读取。A主要用于读取并定期从A-tmp写入。

Is this a reasonable solution? 这是一个合理的解决方案吗？ Or does this give little to no advantage? 还是这几乎没有优势？ Or is this handled for me anyway when I move to replication and sharding with additional hardware? 还是在我转向复制和其他硬件分片时为我处理？

2 个解决方案

The writes would imply a lock (database lock now, hopefully collection lock in the future releases), locking out the read operations. 写入将暗示一个锁（现在为数据库锁，希望在将来的发行版中为集合锁），从而锁定读取操作。

It wouldn't just automatically lock out reads, the lock is writer greedy but there are rules to subside for reads etc. 它不仅会自动锁定读取，锁定是作家贪婪，但有一些规则可以让读取等退却。

I will just defacto paste this link: http://docs.mongodb.org/manual/faq/concurrency/ 我将事实上粘贴此链接： http : //docs.mongodb.org/manual/faq/concurrency/

Using a cronjob entries from A-tmp are periodically moved to A. 使用cronjob，A-tmp中的条目会定期移至A。

Sounds simple. 听起来很简单。

Or does this give little to no advantage? 还是这几乎没有优势？

Now it is good to note that your title mentions "db" but your question mentioned A and A-tmp both being collections. 现在很好地注意到您的标题中提到了“ db”，但您的问题中提到了A和A-tmp都是集合。

I will go upon the basis of collections. 我将以收藏为基础。

No, there is not much benefit to separating them unless there is a serious logical reason as to why, ie application/schema design. 不，将它们分开不会有太大好处，除非有一个合理的逻辑理由说明原因，即应用程序/方案设计。

Or is this handled for me anyway when I move to replication and sharding with additional hardware? 还是在我转向复制和其他硬件分片时为我处理？

Such a thing would not be handled for you, replication would replicate your database(s) to other members of the set while sharding would distribute your database(s) across multiple machines. 这样的事情不会为您处理，复制会将您的数据库复制到集合中的其他成员，而分片会将您的数据库分布在多台计算机上。

They are completely different things to this. 他们与此完全不同。

In your scenario it does not seem to be that is differs from high-availability replication as a replica set will give you the desired behaviour for A-tmp which is the same behaviour for secondary nodes in the replica set. 在您的方案中，似乎与高可用性复制没有什么不同，因为副本集将为您提供A-tmp所需的行为，这与副本集中辅助节点的行为相同。 You will require additional hardware but operationally using a replica set will be much easier than managing a cron job. 您将需要其他硬件，但在操作上使用副本集比管理cron作业要容易得多。

In a high availability scenario with MongoDB you should consider what level of fault tolerance you want to support or how many members can become unavailable before the set is unable to elect a new primary. 在MongoDB的高可用性场景中，您应该考虑要支持的容错级别，或者在该集合无法选择新的主数据库之前，有多少成员变得不可用。 This and a number of other HA concerns are documented here . 此处记录了 HA以及其他一些其他问题。