简体   繁体   中英

Cassandra - Node stuck in joining as another node is down

I am trying to add another node to a Production cassandra cluster as the disc space utilization across nodes is reaching over 90%. However, the node is in joining state for over 2 days. I also noticed that one of the node went down(DN) as it is at 100% disc space utilization. Cassandra server is unable to run on this instance!!
Will this affect bootstrapping completion of the new node? Any immediate solutions for restoring space on the node that went down?

If I remove this out of the ring, this may add more stress of data load and increase disc space on the other nodes. Can I remove any SSTable(like the list of files) temporarily out of the instance, bring up the server, perform clean-up and then add back these files?

-rw-r--r--. 1 polkitd input      5551459 Sep 17  2020 mc-572-big-CompressionInfo.db
-rw-r--r--. 1 polkitd input  15859691072 Sep 17  2020 mc-572-big-Data.db
-rw-r--r--. 1 polkitd input            8 Sep 17  2020 mc-572-big-Digest.crc32
-rw-r--r--. 1 polkitd input     22608920 Sep 17  2020 mc-572-big-Filter.db
-rw-r--r--. 1 polkitd input   5634549206 Sep 17  2020 mc-572-big-Index.db
-rw-r--r--. 1 polkitd input        12538 Sep 17  2020 mc-572-big-Statistics.db
-rw-r--r--. 1 polkitd input     44510338 Sep 17  2020 mc-572-big-Summary.db
-rw-r--r--. 1 polkitd input           92 Sep 17  2020 mc-572-big-TOC.txt
 

If you are using vnodes then downed node will surelyimpact bootstrapping. For immediate relife, identify tables which are not used in traffic and move sstables to backup from that table.

I resolved this by temporarily increasing the EBS volume(disc space)on that node, brought up the server, then removed the node out of the cluster, cleared out cassandra data folders, decreased the EBS Volume and then added back the node to the cluster. One thing that I noticed was removing the node out of the cluster, increased disc space on the other nodes. So I added additional nodes to distribute the load, then ran clean up on all other nodes before moving on to removing the node out of the cluster.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM