简体   繁体   中英

Using DSBulk for backup/restore takes too long

I use dsbulk for text based backup and restore of cassandra cluster. I have created a python script that backsup/restores the all the tables in cassandra cluster using dsbulk load/unload but it takes long time even for less data due to new session created for each table (approx 7s), In my case I have 70 tables, so 70*7s is added due to session creation. Is there a way to backup data from all tables in a cluster using a single session using dsbulk? From the docs, I see dsbulk is suitable only for single table load/unload at a time. Is there any alternative or other approach for this? Please suggest if any..!

Thanks..

No, there isn't a way to load/unload multiple tables in a single DSBulk execution because it doesn't make sense to do so.

In any case, using unloading data to CSV isn't recommended as a means of backing up your cluster because there are no guarantees that the data will be consistent at a point in time.

The correct way of backing up a Cassandra cluster is using the nodetool snapshot command. For details, see Apache Cassandra Backups .

If you're interested, there is an open-source tool which allows you to automate backups -- https://github.com/thelastpickle/cassandra-medusa . Cheers!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM