简体   繁体   English

MongoDB在多个AWS实例中进行负载平衡

[英]MongoDB load balancing in multiple AWS instances

We're using amazon web service for a business application which is using node.js server and mongodb as database. 我们将amazon Web服务用于使用node.js服务器和mongodb作为数据库的业务应用程序。 Currently the node.js server is runing on a EC2 medium instance. 目前,node.js服务器正在EC2媒体实例上运行。 And we're keeping our mongodb database in a separate micro instance. 我们将mongodb数据库保存在一个单独的微实例中。 Now we want to deploy replica set in our mongodb database, so that if the mongodb gets locked or unavailble, we still can run our database and get data from it. 现在我们想在我们的mongodb数据库中部署副本集,这样如果mongodb被锁定或不可用,我们仍然可以运行我们的数据库并从中获取数据。

So we're trying to keep each member of the replica set in separate instances, so that we can get data from the database even if the instance of the primary memeber shuts down. 因此,我们试图将副本集的每个成员保留在不同的实例中,这样即使主要的memeber实例关闭,我们也可以从数据库中获取数据。

Now, I want to add load balancer in the database, so that the database works fine even in huge traffic load at a time. 现在,我想在数据库中添加负载均衡器,以便即使在一次巨大的流量负载下数据库也能正常工作。 In that case I can read balance the database by adding slaveOK config in the replicaSet. 在这种情况下,我可以通过在replicaSet中添加slaveOK配置来读取数据库的平衡。 But it'll not load balance the database if there is huge traffic load for write operation in the database. 但是,如果数据库中的写入操作存在巨大的流量负载,它将不会对数据库进行负载平衡。

To solve this problem I got two options till now. 为了解决这个问题,到目前为止我有两个选择。

Option 1: I've to shard the database and keep each shard in separate instance. 选项1:我要对数据库进行分片并将每个分片保存在单独的实例中。 And under each shard there will be a reaplica set in the same instance. 在每个分片下,将在同一个实例中设置一个reaplica。 But there is a problem, as the shard divides the database in multiple parts, so each shard will not keep same data within it. 但是存在一个问题,因为分片将数据库分成多个部分,因此每个分片都不会在其中保留相同的数据。 So if one instance shuts down, we'll not be able to access the data from the shard within that instance. 因此,如果一个实例关闭,我们将无法从该实例中的分片访问数据。

To solve this problem I'm trying to divide the database in shards and each shard will have a replicaSet in separate instances. 为了解决这个问题,我试图在分片中划分数据库,每个分片在不同的实例中都有一个replicaSet。 So even if one instance shuts down, we'll not face any problem. 因此,即使一个实例关闭,我们也不会遇到任何问题。 But if we've 2 shards and each shard has 3 members in the replicaSet then I need 6 aws instances. 但是如果我们有两个分片,每个分片在replicaSet中有3个成员,那么我需要6个aws实例。 So I think it's not the optimal solution. 所以我认为这不是最佳解决方案。

Option 2: We can create a master-master configuration in the mongodb, that means all the database will be primary and all will have read/write access, but I would also like them to auto-sync with each other every so often, so they all end up being clones of each other. 选项2:我们可以在mongodb中创建一个主 - 主配置,这意味着所有数据库都是主数据库,并且所有数据库都具有读/写访问权限,但我也希望它们每隔一段时间自动同步一次,所以他们最终都成了彼此的克隆人。 And all these primary databases will be in separate instance. 所有这些主要数据库都将在不同的实例中。 But I don't know whether mongodb supports this structure or not. 但我不知道mongodb是否支持这种结构。

I've not got any mongodb doc/ blog for this situation. 对于这种情况,我没有任何mongodb doc / blog。 So, please suggest me what should be the best solution for this problem. 所以,请建议我应该是这个问题的最佳解决方案。

This won't be a complete answer by far, there is too many details and I could write an entire essay about this question as could many others however, since I don't have that kind of time to spare, I will add some commentary about what I see. 到目前为止,这不是一个完整的答案,有太多的细节,我可以像其他许多人一样写一篇关于这个问题的整篇文章,因为我没有那么多的时间,我会补充一些评论关于我所看到的。

Now, I want to add load balancer in the database, so that the database works fine even in huge traffic load at a time. 现在,我想在数据库中添加负载均衡器,以便即使在一次巨大的流量负载下数据库也能正常工作。

Replica sets are not designed to work like that. 副本集的设计并非如此。 If you wish to load balance you might in fact be looking for sharding which will allow you to do this. 如果你想加载平衡,你实际上可能正在寻找分片,这将允许你这样做。

Replication is for automatic failover. 复制用于自动故障转移。

In that case I can read balance the database by adding slaveOK config in the replicaSet. 在这种情况下,我可以通过在replicaSet中添加slaveOK配置来读取数据库的平衡。

Since, to stay up to date, your members will be getting just as many ops as the primary it seems like this might not help too much. 因为,为了保持最新状态,您的成员将获得与初级操作一样多的操作,看起来这可能没有太大帮助。

In reality instead of having one server with many connections queued you have many connections on many servers queueing for stale data since member consistency is eventual, not immediate unlike ACID technologies, however, that being said they are only eventually consistent by 32-odd ms which means they are not lagging enough to give decent throughput if the primary is loaded. 实际上,不是让一台服务器有多个连接排队,而是在许多服务器上排队等待过时数据,因为成员一致性是最终的,而不是像ACID技术那样直接,但是,据说它们最终只有32多毫秒的一致性。意味着如果主要装载,它们不会滞后以提供适当的吞吐量。

Since reads ARE concurrent you will get the same speed whether you are reading from the primary or secondary. 由于读取是并发的,无论您是从主要还是次要读取,都将获得相同的速度。 I suppose you could delay a slave to create a pause of OPs but that would bring back massively stale data in return. 我想你可以延迟一个奴隶来创建一个暂停的OP,但这会带来大量过时的数据。

Not to mention that MongoDB is not multi-master as such you can only write to one node a time makes slaveOK not the most useful setting in the world any more and I have seen numerous times where 10gen themselves recommend you use sharding over this setting. 更不用说MongoDB不是多主机,因此你只能写一个节点一次使slaveOK不再是世界上最有用的设置而且我已经看过很多次10gen自己建议你在这个设置上使用分片。

Option 2: We can create a master-master configuration in the mongodb, 选项2:我们可以在mongodb中创建主 - 主配置,

This would require you own coding. 这需要您自己编码。 At which point you may want to consider actually using a database that supports http://en.wikipedia.org/wiki/Multi-master_replication 此时您可能需要考虑实际使用支持http://en.wikipedia.org/wiki/Multi-master_replication的数据库

This is since the speed you are looking for is most likely in fact in writes not reads as I discussed above. 这是因为你正在寻找的速度很可能实际上是在写入而不是如上所述的读取。

Option 1: I've to shard the database and keep each shard in separate instance. 选项1:我要对数据库进行分片并将每个分片保存在单独的实例中。

This is the recommended way but you have found the caveat with it. 这是推荐的方式,但你已经发现了它的警告。 This is unfortunately something that remains unsolved that multi-master replication is supposed to solve, however, multi-master replication does add its own ship of plague rats to Europe itself and I would strongly recommend you do some serious research before you think as to whether MongoDB cannot currently service your needs. 遗憾的是,多主复制应该解决的问题尚未解决,但是,多主复制确实会将自己的瘟疫大鼠添加到欧洲本身,我强烈建议您在考虑之前是否进行了一些认真的研究。 MongoDB目前无法满足您的需求。

You might be worrying about nothing really since the fsync queue is designed to deal with the IO bottleneck slowing down your writes as it would in SQL and reads are concurrent so if you plan your schema and working set right you should be able to get a massive amount of OPs. 您可能不必担心任何事情,因为fsync队列旨在处理IO瓶颈,这会减慢您的写入速度,就像在SQL中一样,并且读取是并发的,因此如果您计划架构和工作正确,您应该能够获得大量OP的数量。

There is in fact a linked question around here from a 10gen employee that is very good to read: https://stackoverflow.com/a/17459488/383478 and it shows just how much throughput MongoDB can achieve under load. 事实上,这里有一个10gen员工的相关问题非常好读: https//stackoverflow.com/a/17459488/383478它显示了MongoDB在负载下可以实现的吞吐量。

It will grow soon with the new document level locking that is already in dev branch. 它将很快通过已经在dev分支中的新文档级锁定来增长。

Option 1 is the recommended way as pointed out by @Sammaye but you would not need 6 instances and can manage it with 4 instances. 选项1是@Sammaye指出的推荐方式,但您不需要6个实例,并且可以使用4个实例进行管理。

Assuming you need below configuration. 假设您需要以下配置。

  • 2 shards (S1, S2) 2个碎片(S1,S2)
  • 1 copy for each shard (Replica set secondary) (RS1, RS2) 每个分片1个副本(副本集辅助)(RS1,RS2)
  • 1 Arbiter for each shard (RA1, RA2) 每个碎片1个仲裁器(RA1,RA2)

You could then divide your server configuration like below. 然后,您可以将服务器配置划分如下。

Instance 1 : Runs : S1 (Primary Node)
Instance 2 : Runs : S2 (Primary Node)
Instance 3 : Runs : RS1 (Secondary Node S1) and RA2 (Arbiter Node S2)
Instance 4 : Runs : RS2 (Secondary Node S2) and RA1 (Arbiter Node S1)

You could run arbiter nodes along with your secondary nodes which would help you in election during fail-overs. 您可以运行仲裁节点以及辅助节点,这将有助于您在故障转移期间进行选举。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM