简体繁体中英

how to perform defragmentation on cassandra table

原文 2015-01-02 03:52:25 0 1 python/ cassandra/ defragmentation

I am playing around with Python and some of NoSql DBs to create file store(mainly because of built in replication), i tried it with MongoDB and its working but due to "Write Greedy" nature of MongoDB i moved to cassandra and implemented the same thing. While its working, i want to know (point me to docs that will be fine) how to defragment the data in cassandra. i will explain this with example, say i upload the 200 MB file, then 20 MB file. now data size in cassandra is ~220MB. If i go and delete the 200MB file then also i see that data size is ~200MB so that space is not gained back. In mongoDB there is a command to gain (re use the same space for new files) i want to know how same can be achieved in cassandra. I am getting confused b/w compress & compaction.

And to store data i am splitting file in part and then storing as "blob" in table.

1 answers

Cassandra cleans up deleted and expired data using a process called compaction .

While you can force compactions yourself using nodetool compact , I would not recommend this as it is better to tune compaction and let it happen in the background.

That may not completely do the trick as cassandra has a configuration property named 'gc_grace_seconds' which prevents data marked as deleted (with a tombstone) from being deleted until gc_grace_seconds passes. The default is 10 days but you can configure this to a smaller value or even make it 0 to disable tombstones all together.

How to pass the table name in parameter in cassandra python

How to add cassandra table column dynamically?

How to perform table/row locks in Django

cassandra-driver from DataStax: How to create Cassandra DB table from a Model?

Cassandra: how to get total table size / estimate row count

How to append single row to a cassandra table using pyspark?

How to save data in cassandra table using spark python?

How to alter table add column if column not exists in cassandra?

How to save data in cassandra table using spark's saveToCassandra?

scapy: UDP defragmentation timestamp problem

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question How to pass the table name in parameter in cassandra python How to add cassandra table column dynamically? How to perform table/row locks in Django cassandra-driver from DataStax: How to create Cassandra DB table from a Model? Cassandra: how to get total table size / estimate row count How to append single row to a cassandra table using pyspark? How to save data in cassandra table using spark python? How to alter table add column if column not exists in cassandra? How to save data in cassandra table using spark's saveToCassandra? scapy: UDP defragmentation timestamp problem

Related Tags

how to perform defragmentation on cassandra table

Question

1 answers

solution1 0 ACCPTED 2015-01-02 04:34:51

solution1
0 ACCPTED 2015-01-02 04:34:51