简体   繁体   中英

Synchronise two MongoDB collections

I have 2 mongodb placed in 2 different servers. Each has a collection items . The first collection has production data, and performs a lot of insert and update , the second one is empty.

Now my task is to transfer data from first to second collection, and to keep them in sync for few hours.

We already implemented the oplog solution. But since we lack of permission to listen to local collection in the first mongodb, we must find another way out.

One of the way I thought of is to create 2 services: - In first call, I query all data from the 1st collection and transfer to 2nd collection. Then I save that data in memory. - In second call, I query all data from the 1st collection, then using tool to diff them, then send the diff to 2nd collection. - Repeat until one of 2 services is taken down.

The obvious problem is the huge waste of resources for querying and comparing data.

So I need your help to find another way to solve this issue.

Thanks in advance.

HP

The solution you described in your OP:

One of the way I thought of is to create 2 services: - In first call, I query all data from the 1st collection and transfer to 2nd collection. Then I save that data in memory. - In second call, I query all data from the 1st collection, then using tool to diff them, then send the diff to 2nd collection. - Repeat until one of 2 services is taken down.

... makes me think this is some sort of blue/green deployment model or perhaps your intention here is to provide resilience in the face of losing the Mongo store for the first collection. If so, then I think the correct approach is to use a Mongo replicaset and let Mongo look after resilience for you.

However, I may be missing something ... perhaps there is some detail of your situation which (a) I have not been able to infer from your question and (b) demands some sort of manual, near real-time, copy from one collection to the other. If so, then I think the oplog solution is the common solution to this use case. Perhaps you should revisit that to see if you can overcome this issue:

we lack of permission to listen to local collection in the first mongodb

If that's really not a runner then if you can intercept all writes to the first collection (ie if your application provides a throttlepoint or hook for applying behaviour to all writes) you could implement something like this:

  • Before you proceed with the write wrap up the incoming command (ie the data and the type of write: insert|update|delete) in some sort of executable task
  • Put that task on a queue
  • Provide a thread pool which acts on these tasks, applying each task's command to the second colleciotn.

For example:

  • Receive an INSERT with data
    • Apply this insert to the first collection (as you would normally do)
    • Asynchronously (so as not to adversely affect application throughput) apply this insert to the second collection
  • Receive a DELETE for entity 123
    • Delete entity 123 from the first collection
    • Asynchronously delete entity 123 from the first collection
  • ... etc

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM