简体繁体 English

AWS上的Cassandra

[英]Cassandra on AWS

原文 2015-04-29 11:59:06 7 3 amazon-web-services/ cassandra

I'm new to AWS and also to Cassandra. 我是AWS的新手，也是Cassandra的新手。 I just read about EBS and S3 storage available in AWS. 我刚刚阅读了AWS中可用的EBS和S3存储。 I was trying to figure out if we have Cassandra installed in EC2, which storage would it use? 我试图弄清楚我们是否在EC2中安装了Cassandra，它将使用哪个存储？ EBS or S3? EBS还是S3？ Or is there other storage? 还是有其他存储空间？ I'm little confused with this. 我对此很困惑。 Please help me understand this. 请帮我理解这个。

Thanks Aravind 谢谢Aravind

3 个解决方案

For Cassandra you need to use EBS. 对于Cassandra，您需要使用EBS。 S3 is an object store with and API to store and retrieve objects, but not easy querying mechanisms. S3是一个对象存储，带有用于存储和检索对象的API，但不是简单的查询机制。 The use cases include backup and archiving, Disaster Recovery, Static Website Hosting, etc 用例包括备份和归档，灾难恢复，静态网站托管等

However, you can use S3 for Cassandra backup . 但是，您可以使用S3进行Cassandra备份。

You can also consider ephemeral disks (as Jeff mentions) and storage which comes with AWS instance. 您还可以考虑临时磁盘（如Jeff所述）和AWS实例附带的存储。

You shouldn't run Cassandra on EBS, as recommended per Datastax itself : 您不应该按照Datastax本身的建议在EBS上运行Cassandra：

"EBS volumes are not recommended for Cassandra data volumes for the following reasons: “出于以下原因，不推荐使用EBS卷用于Cassandra数据卷：

EBS volumes contend directly for network throughput with standard packets. EBS卷直接争用标准数据包的网络吞吐量。 This means that EBS throughput is likely to fail if you saturate a network link. 这意味着如果您使网络链接饱和，EBS吞吐量可能会失败。 EBS volumes have unreliable performance. EBS卷的性能不可靠。 I/O performance can be exceptionally slow, causing the system to back load reads and writes until the entire cluster becomes unresponsive. I / O性能可能异常缓慢，导致系统反向加载读取和写入，直到整个群集无响应。 Adding capacity by increasing the number of EBS volumes per host does not scale. 通过增加每个主机的EBS卷数来增加容量不会扩展。 You can easily surpass the ability of the system to keep effective buffer caches and concurrently serve requests for all of the data it is responsible for managing." 您可以轻松超越系统保持有效缓冲区缓存的能力，并同时为其负责管理的所有数据提供请求。“

http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html

The answer above comes from Cassandra 1.2, a relatively old version. 上面的答案来自Cassandra 1.2，一个相对较旧的版本。 Documentation for newer versions of Cassandra indicate that EBS Optimized instances using GP2 SSD can be used for production workloads. 较新版本的Cassandra的文档表明使用GP2 SSD的EBS Optimized实例可用于生产工作负载。

http://docs.datastax.com/en/cassandra/3.x/cassandra/planning/planPlanningEC2.html http://docs.datastax.com/en/cassandra/3.x/cassandra/planning/planPlanningEC2.html

Things that changed since then were the creation of EBS Optimized instances, which reduces and/or eliminates noisy neighbor throughput problems, and using GP2 SSD for EBS storage. 从那时起发生了变化的事情是创建EBS优化实例，这可以减少和/或消除嘈杂的邻居吞吐量问题，并使用GP2 SSD进行EBS存储。

If you are just getting started, I would recommend EBS Optimized. 如果您刚刚开始，我会推荐EBS Optimized。 The performance should be pretty good, but you gain a critical ability -> creating snapshots. 性能应该相当不错，但您获得了关键能力 - >创建快照。 This reduces the risk of your instance becoming unstable because you would have S3-backed volume snapshots for AWS to rebuild data from if a drive died. 这样可以降低实例变得不稳定的风险，因为您可以使用S3支持的卷快照来重建驱动器死机时的数据。

This reduces the need to setup your Cassandra cluster across regions. 这减少了跨区域设置Cassandra集群的需求。 One of the concerns that you have to build around when using Ephemeral is a whole region potentially going down, which could wipe out your entire cluster if you didn't build a multi-region cluster. 使用Ephemeral时必须构建的一个问题是整个区域可能会崩溃，如果您没有构建多区域群集，则可能会消灭整个群集。 With EBS, this isn't really a concern. 对于EBS，这不是一个真正的问题。