简体   繁体   中英

Cassandra on AWS

I'm new to AWS and also to Cassandra. I just read about EBS and S3 storage available in AWS. I was trying to figure out if we have Cassandra installed in EC2, which storage would it use? EBS or S3? Or is there other storage? I'm little confused with this. Please help me understand this.

Thanks Aravind

For Cassandra you need to use EBS. S3 is an object store with and API to store and retrieve objects, but not easy querying mechanisms. The use cases include backup and archiving, Disaster Recovery, Static Website Hosting, etc

However, you can use S3 for Cassandra backup .

You can also consider ephemeral disks (as Jeff mentions) and storage which comes with AWS instance.

You shouldn't run Cassandra on EBS, as recommended per Datastax itself :

"EBS volumes are not recommended for Cassandra data volumes for the following reasons:

EBS volumes contend directly for network throughput with standard packets. This means that EBS throughput is likely to fail if you saturate a network link. EBS volumes have unreliable performance. I/O performance can be exceptionally slow, causing the system to back load reads and writes until the entire cluster becomes unresponsive. Adding capacity by increasing the number of EBS volumes per host does not scale. You can easily surpass the ability of the system to keep effective buffer caches and concurrently serve requests for all of the data it is responsible for managing."

http://docs.datastax.com/en/cassandra/1.2/cassandra/architecture/architecturePlanningEC2_c.html

The answer above comes from Cassandra 1.2, a relatively old version. Documentation for newer versions of Cassandra indicate that EBS Optimized instances using GP2 SSD can be used for production workloads.

http://docs.datastax.com/en/cassandra/3.x/cassandra/planning/planPlanningEC2.html

Things that changed since then were the creation of EBS Optimized instances, which reduces and/or eliminates noisy neighbor throughput problems, and using GP2 SSD for EBS storage.

If you are just getting started, I would recommend EBS Optimized. The performance should be pretty good, but you gain a critical ability -> creating snapshots. This reduces the risk of your instance becoming unstable because you would have S3-backed volume snapshots for AWS to rebuild data from if a drive died.

This reduces the need to setup your Cassandra cluster across regions. One of the concerns that you have to build around when using Ephemeral is a whole region potentially going down, which could wipe out your entire cluster if you didn't build a multi-region cluster. With EBS, this isn't really a concern.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM