简体   繁体   中英

STCS : how I can improve compaction performance?

I have six nodes Cassandra cluster, which host a large columnfamily (cql table) that is immuable (because it's a kind of an history table from an application point of view). Such table is about 400Go of compressed data, which is not that much!

So after truncating the table, then ingest the app history data in it, I trigger nodetool compact on it on each node, in order to have the best read performance, by reducing down the number of SSTables. The compaction strategy is STCS .

After running nodetool compact , I trigger nodetool compactionstats to follow the compaction progress :

 id  compaction type    keyspace        table            completed total    unit  progress
 xxx Compaction         mykeyspace      mytable          3.65 GiB  1.11 TiB bytes 0.32%

After hours I have on that same node :

 id  compaction type    keyspace        table            completed total    unit  progress
 xxx Compaction         mykeyspace      mytable          4.08 GiB  1.11 TiB bytes 0.36%

So the compaction process seems to work, but it's terribly slow .

Even with nodetool setcompactionthreshold -- 0 , the compaction remains terribly slow. Moreover, CPU seems to be used to 100% because of that compaction.

Questions :

  1. What are configurations parameters that I can tune to try to boost compaction performance ?
  2. Could the 100% CPU when compaction occurs be related to GC pressure ?
  3. If compaction is too slow, it is relevant to add more nodes, or add more CPU/RAM to each nodes ? Could it help ?

Performance of compaction depends on the underlying hardware - its performance depends on what kind of disks is used, etc. But it also depends on how many compaction threads are allowed to run, and what throughput is configured for compaction threads. From command line compaction throughput is configured by nodetool setcompactionthroughput , not the nodetool setcompactionthreshold as you used. And number of concurrent compactors is set with nodetool setconcurrentcompactors (but it's available in 3.1, IIRC). You can also configure default values in the cassandra.yaml .

So if you have enough CPU power, and good SSD disks, then you can bump compaction throughput, and number of compactors.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM