简体   繁体   中英

Backing up Cassandra Cluster with snapshots and uploading to s3/vm?

Is backing up Cassandra using snapshots and uploading them a common thing to do with a cluster?

I was thinking of having a cron job on each node take a snapshot, tar it, and upload it every 24 hours but I am a bit worried about the performance implications of it. Once the data on a node gets big couldn't this cripple it?

The backups created by nodetool snapshot in Cassandra are hard links, so effectively won't use any more space than the original file. See this post for an explanation of hard / soft links:

https://askubuntu.com/questions/108771/what-is-the-difference-between-a-hard-link-and-a-symbolic-link

However, if you are not clearing the snapshots using nodetool clearsnapshot then your data will grow on the cluster over time. The docs here talk about clearing snapshots

Incidentally nodetool tablestats (formerly nodetool cfstats ) is very useful for seeing how much snapshot data you're using on a given node for a given table.

there are 2 kind of backup strategies, full backup and incremental backups. Once you take a full backup enable incremental backups on each node. you can make 1 cron job to sync all incremental backups to s3. (Fullbackup + all incremental backups after this makes a uptodate backup).

So you can have another cron job which you might run only weekend or once in a month to remove all previous backups and take a full backup.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM