简体   繁体   中英

how to perform defragmentation on cassandra table

I am playing around with Python and some of NoSql DBs to create file store(mainly because of built in replication), i tried it with MongoDB and its working but due to "Write Greedy" nature of MongoDB i moved to cassandra and implemented the same thing. While its working, i want to know (point me to docs that will be fine) how to defragment the data in cassandra. i will explain this with example, say i upload the 200 MB file, then 20 MB file. now data size in cassandra is ~220MB. If i go and delete the 200MB file then also i see that data size is ~200MB so that space is not gained back. In mongoDB there is a command to gain (re use the same space for new files) i want to know how same can be achieved in cassandra. I am getting confused b/w compress & compaction.

And to store data i am splitting file in part and then storing as "blob" in table.

Cassandra cleans up deleted and expired data using a process called compaction .

While you can force compactions yourself using nodetool compact , I would not recommend this as it is better to tune compaction and let it happen in the background.

That may not completely do the trick as cassandra has a configuration property named 'gc_grace_seconds' which prevents data marked as deleted (with a tombstone) from being deleted until gc_grace_seconds passes. The default is 10 days but you can configure this to a smaller value or even make it 0 to disable tombstones all together.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM