简体   繁体   中英

MongoDB: separate collections for read and write for high performance

I use mongodb and want to design the database to meet high scalability requirements. Currently, let's say a collection A is heavily used for read and writes. The writes would imply a lock (database lock now, hopefully collection lock in the future releases), locking out the read operations.

My idea is to duplicate A into A and A-tmp, where both have the same schema. A holds all data while A-tmp is initially empty. New entries gets inserted into A-tmp. Using a cronjob entries from A-tmp are periodically moved to A. When the application tries to lookup data after write in will look in A, and if it data is not found subsequently look in A-tmp. Thus, A-tmp is mainly used for writes and occasionally read from when entries are not found in A. A is mainly used for reads and periodically written to from A-tmp.

Is this a reasonable solution? Or does this give little to no advantage? Or is this handled for me anyway when I move to replication and sharding with additional hardware?

The writes would imply a lock (database lock now, hopefully collection lock in the future releases), locking out the read operations.

It wouldn't just automatically lock out reads, the lock is writer greedy but there are rules to subside for reads etc.

I will just defacto paste this link: http://docs.mongodb.org/manual/faq/concurrency/

Using a cronjob entries from A-tmp are periodically moved to A.

Sounds simple.

Or does this give little to no advantage?

Now it is good to note that your title mentions "db" but your question mentioned A and A-tmp both being collections.

I will go upon the basis of collections.

No, there is not much benefit to separating them unless there is a serious logical reason as to why, ie application/schema design.

Or is this handled for me anyway when I move to replication and sharding with additional hardware?

Such a thing would not be handled for you, replication would replicate your database(s) to other members of the set while sharding would distribute your database(s) across multiple machines.

They are completely different things to this.

In your scenario it does not seem to be that is differs from high-availability replication as a replica set will give you the desired behaviour for A-tmp which is the same behaviour for secondary nodes in the replica set. You will require additional hardware but operationally using a replica set will be much easier than managing a cron job.

In a high availability scenario with MongoDB you should consider what level of fault tolerance you want to support or how many members can become unavailable before the set is unable to elect a new primary. This and a number of other HA concerns are documented here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM