简体繁体 English

同步两个MongoDB集合

[英]Synchronise two MongoDB collections

原文 2017-09-26 08:47:19 0 1 node.js/ mongodb

I have 2 mongodb placed in 2 different servers. 我有2个mongodb放在2个不同的服务器中。 Each has a collection items . 每个都有一个收藏items 。 The first collection has production data, and performs a lot of insert and update , the second one is empty. 第一个集合具有生产数据，并执行许多insert和update ，第二个集合为空。

Now my task is to transfer data from first to second collection, and to keep them in sync for few hours. 现在，我的任务是将数据从第一个集合传输到第二个集合，并使它们保持同步几个小时。

We already implemented the oplog solution. 我们已经实现了oplog解决方案。 But since we lack of permission to listen to local collection in the first mongodb, we must find another way out. 但是由于我们没有权限在第一个mongodb中收听local集合，因此我们必须找到另一种出路。

One of the way I thought of is to create 2 services: - In first call, I query all data from the 1st collection and transfer to 2nd collection. 我想到的一种方法是创建2个服务：-在第一次调用中，我查询第一个集合中的所有数据并将其传输到第二个集合。 Then I save that data in memory. 然后，我将该数据保存在内存中。 - In second call, I query all data from the 1st collection, then using tool to diff them, then send the diff to 2nd collection. -在第二次调用中，我查询第一个集合的所有数据，然后使用工具对它们进行比较，然后将差异发送到第二个集合。 - Repeat until one of 2 services is taken down. -重复直到2个服务之一被取消。

The obvious problem is the huge waste of resources for querying and comparing data. 明显的问题是查询和比较数据的资源浪费。

So I need your help to find another way to solve this issue. 因此，在您需要其他帮助的情况下，我需要您的帮助。

Thanks in advance. 提前致谢。

HP 生命值

1 个解决方案

The solution you described in your OP: 您在OP中描述的解决方案：

One of the way I thought of is to create 2 services: - In first call, I query all data from the 1st collection and transfer to 2nd collection. 我想到的一种方法是创建2个服务：-在第一次调用中，我查询第一个集合中的所有数据并将其传输到第二个集合。 Then I save that data in memory. 然后，我将该数据保存在内存中。 - In second call, I query all data from the 1st collection, then using tool to diff them, then send the diff to 2nd collection. -在第二次调用中，我查询第一个集合的所有数据，然后使用工具对它们进行比较，然后将差异发送到第二个集合。 - Repeat until one of 2 services is taken down. -重复直到2个服务之一被取消。

... makes me think this is some sort of blue/green deployment model or perhaps your intention here is to provide resilience in the face of losing the Mongo store for the first collection. ……让我认为这是一种蓝/绿部署模型，或者您的意图是在面对第一个系列的Mongo商店丢失时提供弹性。 If so, then I think the correct approach is to use a Mongo replicaset and let Mongo look after resilience for you. 如果是这样，那么我认为正确的方法是使用Mongo副本集，让Mongo为您提供弹性。

However, I may be missing something ... perhaps there is some detail of your situation which (a) I have not been able to infer from your question and (b) demands some sort of manual, near real-time, copy from one collection to the other. 但是，我可能会遗漏某些东西……也许您的情况有一些细节，这些细节（a）我无法从您的问题中推断出来，并且（b）需要某种手动，近乎实时的副本，收集到另一个。 If so, then I think the oplog solution is the common solution to this use case. 如果是这样，那么我认为oplog解决方案是该用例的通用解决方案。 Perhaps you should revisit that to see if you can overcome this issue: 也许您应该重新审视一下是否可以解决此问题：

we lack of permission to listen to local collection in the first mongodb 我们没有权限在第一个mongodb中收听本地集合

If that's really not a runner then if you can intercept all writes to the first collection (ie if your application provides a throttlepoint or hook for applying behaviour to all writes) you could implement something like this: 如果那不是真正的竞争者，那么如果您可以拦截对第一个集合的所有写操作（即，如果您的应用程序提供了将行为应用于所有写操作的调节点或钩子），则可以实现以下内容：

Before you proceed with the write wrap up the incoming command (ie the data and the type of write: insert|update|delete) in some sort of executable task 在继续写之前，请在某种可执行任务中包装传入的命令（即数据和写类型：insert | update | delete）
Put that task on a queue 将该任务放在队列中
Provide a thread pool which acts on these tasks, applying each task's command to the second colleciotn. 提供一个作用于这些任务的线程池，将每个任务的命令应用于第二个同事。

For example: 例如：

Receive an INSERT with data 接收包含data的INSERT
- Apply this insert to the first collection (as you would normally do) 将此插入内容应用于第一个集合（通常会这样做）
- Asynchronously (so as not to adversely affect application throughput) apply this insert to the second collection 异步（以免对应用程序吞吐量产生不利影响）将此插入内容应用于第二个集合
Receive a DELETE for entity 123 收到实体123的DELETE
- Delete entity 123 from the first collection 从第一个集合中删除实体123
- Asynchronously delete entity 123 from the first collection 从第一个集合中异步删除实体123
... etc ...等等