简体繁体 English

在Amazon EBS上使用预配置IOPS时，是否需要在Mongo上运行RAID 10？

[英]Do you need to run RAID 10 on Mongo when using Provisioned IOPS on Amazon EBS?

原文 2013-10-03 18:06:25 5 2 mongodb/ amazon-web-services

I'm trying to setup a production mongo system on Amazon to use as a datastore for a realtime metrics system, 我正在尝试在亚马逊上设置一个生产mongo系统，用作实时指标系统的数据存储区，

I initially used the MongoDB AMIs[1] in the Marketplace, but I'm confused in that there is only one data EBS. 我最初在市场中使用了MongoDB AMI [1]，但我很困惑，因为只有一个数据EBS。 I've read that Mongo recommends RAID 10 on EBS storage (8 EBS on each server). 我读过Mongo建议在EBS存储上使用RAID 10（每台服务器上有8个EBS）。 Additionally, I've read that the bare minimum for production is a primary/secondary with an arbiter. 另外，我已经读过，生产的最低限度是主要/次要的仲裁。 Is RAID 10 still the recommended setup, or is one provisioned IOPS EBS sufficient? RAID 10仍然是推荐的设置，还是一个配置的IOPS EBS足够？

Please Advise. 请指教。 We are a small shop, so what is the bare minimum we can get away with and still be reasonably safe? 我们是一家小商店，那么我们可以逃脱的最低限度是什么，并且仍然相当安全？

[1] MongoDB 2.4 with 1000 IOPS - data: 200 GB @ 1000 IOPS, journal: 25 GB @ 250 IOPS, log: 10 GB @ 100 IOPS [1]具有1000 IOPS的MongoDB 2.4 - 数据：200 GB @ 1000 IOPS，日志：25 GB @ 250 IOPS，日志：10 GB @ 100 IOPS

2 个解决方案

So, I just got off of a call with an Amazon System Engineer, and he had some interesting insights related to this question. 所以，我刚刚与亚马逊系统工程师打电话，他有一些与这个问题相关的有趣见解。

First off, if you are going to use RAID, he said to simply do striping, as the EBS blocks were mirrored behind the scenes anyway, so raid 10 seemed like overkill to him. 首先，如果你打算使用RAID，他说只是做条带化，因为无论如何EBS块都在幕后反映出来，所以raid 10对他来说似乎有些过分。
Standard EBS volumes tend to handle spiky traffic well (it may be able to handle 1K-2K iops for a few seconds), however eventually it will tail off to an average of 100 iops. 标准EBS卷倾向于很好地处理尖刺流量（它可能能够处理1K-2K iops几秒钟），但最终它将平均减少100个iops。 One suggestion was to use many small EBS volumes and stripe them to get better iops throughput. 一个建议是使用许多小型EBS卷并对其进行条带化以获得更好的iops吞吐量。
Some of his customers use just the ephemeral storage on the EC2 images, but then have multiple (3-5) nodes in the availability set. 他的一些客户只使用EC2图像上的短暂存储，但随后在可用性集中有多个（3-5）节点。 The ephemeral storage is the storage on the physical machine. 短暂存储是物理机器上的存储。 Apparently, if you use the EC2 instance with the SSD storage, you can get up to 20K iops. 显然，如果您将EC2实例与SSD存储一起使用，则最多可以获得20K iops。
Some customers will do a huge EC2 image w/ssd for the master, then do a smaller EC2 w/ EBS for the secondary. 一些客户将为主设备执行一个巨大的EC2映像w / ssd，然后为辅助设备执行较小的EC2 w / EBS。 The primary machine is performant, but the failover is available but has degraded performance. 主计算机性能良好，但故障转移可用，但性能下降。
make sure you check 'EBS Optimized' when you spin up an instance. 确保在启动实例时检查“EBS Optimized”。 That means you have a dedicated channel to the EBS storage (of any kind) instead of sharing the NIC. 这意味着您有一个专用的EBS存储通道（任何类型），而不是共享NIC。
Important! 重要！ Provisioned IOPS EBS is expensive, and the bill does not shut off when you shut down the EC2 instances they are attached to. 预配置的IOPS EBS很昂贵，当您关闭它们所连接的EC2实例时，账单不会关闭。 (this sucks while you are testing) His advice was to take a snapshot of the EBS volumes, then delete them. （这在测试时很糟糕）他的建议是拍摄EBS卷的快照，然后删除它们。 When you need them again, just create new provisioned IOPS EBS volumes, restore the snapshot, then reconfigure your EC2 instances to attache the new storage. 当您再次需要它们时，只需创建新的预配置IOPS EBS卷，还原快照，然后重新配置EC2实例以附加新存储。 (it's more work than it should be, but it's worth it not to get sucker punched with the IOPS bill. （它的工作量应该超出应有的水平，但值得一提的是不要让傻瓜大肆宣传IOPS法案。

I've got the same question. 我有同样的问题。 Both Amazon and Mongodb try to market a lot on provisioned IOPs chewing over its advantages over a standard EBS volume. 亚马逊和Mongodb都试图在配置的IOP上大肆宣传其优于标准EBS卷的优势。 We run prod instances on m2.4xlarge aws instances with 1 primary and 2 secondaries setup per service. 我们在m2.4xlarge aws实例上运行prod实例，每个服务设置1个主服务器和2个辅助服务器。 In the highest utilized service cluster, apart from a few slow queries the monitoring charts do not reveal any drop on performance at all. 在利用率最高的服务集群中，除了一些缓慢的查询之外，监控图表根本不会显示任何性能下降。 Page faults are rare occurrences and that too between 0.0001 and 0.0004 faults once or twice a day. 页面错误很少发生，每天一次或两次也会发生0.0001到0.0004次错误。 Background flushes are in milliseconds and locks and queues are so far at manageable levels. 后台刷新以毫秒为单位，锁和队列到目前为止处于可管理的级别。 I/O waits on the Primary node at any time ranges between 0 to 2 %, mostly less than 1 and %idle steadily stays above 90% mark. I / O在主节点上等待0到2％之间的任何时间，大多数小于1，并且％idle空闲稳定地保持在90％以上。 Do I still need to consider provisioned IOPs given we've a budget still to improve any potential performance drag? 我是否仍需要考虑配置的IOP，因为我们仍有预算可以改善任何潜在的性能拖累？ Any guidance will be appreciated. 任何指导将不胜感激。