简体   繁体   中英

Backup and Restore Cassandra from 4 node cluster

I have a Cassandra 4 node cluster. Each node has 50% of the data. Can anyone please suggest me the best way how should I take backup so that when I restore I should get back all the data.

Thanks for your help.

Best practice is to create a snapshot (basically backs up all your existing data by creating a hardlink to the sstables which are cassandra's data files). What other threads don't seem to mention is that you also want to back up your schema. This can be done using cqlsh's describe command eg:

DESCRIBE TABLE system.schema_columns;

CREATE TABLE system.schema_columns (
    keyspace_name text,
// some output removed
    PRIMARY KEY (keyspace_name, columnfamily_name, column_name)
) WITH CLUSTERING ORDER BY (columnfamily_name ASC, column_name ASC)
// removed rest ouf output.

Also use a parallel ssh tool to create the snapshots on all your nodes ( pssh is one of the popular tools)

So to outline the process:

  1. Back up your schema (only necessary once per table ALTER)
  2. Use pssh to create a parallel snapshot
  3. Back the snapshots up somewhere in another non-cassandra machine (if you have hardware failure leaving the snapshots on the same machine as cassandra means you're running a risk of loosing them and the node at the same time).

There is an overview of how to snapshot here and an overview of how to recover lost nodes using a snapshot here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM