简体   繁体   中英

Is it possible to backup and restore Cassandra cluster using dsbulk?

I searched through the inte.net a lot and saw a lot of ways to backup and restore a Cassandra cluster, such as nodetool snapshot and Medusa . but my question is that can I use dsbulk to backup a Cassandra cluster. What are its limitations? Why doesn't anyone suggest that?

It's possible to use it in some cases, but it's not practical because (that are primary, list could be bigger):

  • DSBulk put an additional load onto the cluster nodes because it's going through the standard read path . In contrast to that nodetool snapshot just create a hardlinks to the files with data, no additional load to the nodes
  • It's harder to implement incremental backups with DSBulk - you need to come with condition for SELECT that will find only data that changed since the last backup, so you need to have timestamp column, because you can't do the WHERE condition on the value of writetime function. Plus it will require rescanning of whole data anyway. Plus it's impossible to find what data were deleted. With nodetool snapshot , you just compare what files has changed since last backup, and backup only them.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM