简体   繁体   English

如何在Cassandra sstable上设置TTL

[英]How to set TTL on Cassandra sstable

We are using Cassandra 3.10 with 6 nodes cluster. 我们将Cassandra 3.10与6个节点集群一起使用。

lately, we noticed that our data volume increased drastically, approximately 4GB per day in each node. 最近,我们注意到我们的数据量急剧增加,每个节点每天大约4GB。 We want to implement a more aggressive retention policy in which we will change the compaction to TWCS with 1-hour window size and set a few days TTL, this can be achieved via the table properties. 我们希望实施一种更具侵略性的保留策略,在该策略中,我们将压缩方式更改为1小时窗口大小的TWCS,并设置几天的TTL,这可以通过表属性来实现。

Since the ETL should be a slow process in order to lighten Cassandra workload it possible that it will not finish extracting all the data until the TTL, so I wanted to know is there a way for the ETL process to set TTL=0 on entire SSTable once it done extracting it? 由于ETL为了减轻Cassandra的工作量应该是一个缓慢的过程,因此可能要等到TTL才能完成对所有数据的提取,所以我想知道ETL过程是否有办法在整个SSTable上设置TTL = 0一旦提取完成?

TTL=0 is read as a tombstone. TTL = 0被读取为逻辑删除。 When next compacted it would be written tombstone or purged depending on your gc_grace. 下次压缩时,将根据您的gc_grace将其写入墓碑或清除。 Other than the overhead of doing the writes of the tombstone it might be easier just to do a delete or create sstables that contain the necessary tombstones than to rewrite all the existing sstables. 除了进行逻辑删除的写操作之外,仅进行删除或创建包含必要逻辑删除的sstables可能比重写所有现有的sstables容易。 If its more efficient to do range or point tombstones will depend on your version and schema. 是否更有效地进行范围或点逻辑删除取决于您的版本和架构。

An option that might be easiest is to actually use a different compaction strategy all together or a custom one like https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy . 可能最简单的选择是一起使用不同的压缩策略,或者像https://github.com/protectwise/cassandra-util/tree/master/deleting-compaction-strategy这样的自定义策略。 You can then just purge data on compactions that have been processed. 然后,您可以只清除已处理的压缩数据。 This still depends quite a bit on your schema on how hard it would be to mark whats been processed or not. 这仍然很大程度上取决于您的架构,以标记要处理或未处理的内容有多困难。

You should set TTL 0 on table and query level as well. 您还应该在表和查询级别上设置TTL 0。 Once TTL expire data will converted to tombstones. TTL过期后,数据将转换为逻辑删除。 Based on gc_grace_seconds value next compaction will clear all the tombstones. 根据gc_grace_seconds的值,下一次压缩将清除所有逻辑删除。 you may run major compaction also to clear tombstones but it is not recommended in cassandra based on compaction strategy. 您也可以运行大型压实来清除墓碑,但是在基于压缩策略的卡桑德拉中不建议这样做。 if STCS atleast 50% disk required to run healthy compaction. 如果运行健康压缩需要STCS至少有50%的磁盘。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM