简体   繁体   中英

DSBulk unloading 1TB of data from Kubernetes DSE Cluster fails

I am using DSBulk to unload data into CSV from a DSE cluster installed under Kubernetes, My cluster consists of 9 Kubernetes Pods each with 120 GB Ram.

I have monitored the resources while unloading the data and observed that the more the data is fetched in CSV the more the ram is getting utilised and pods are restarting due to lack of memory.

If one Pod is down at a time the DSBulk unload won't fail, but if 2 Pods are down unload will fail with the exception :

Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded).

Is there a way to avoid this exceeding of memory happening or is there a way to increase the timeout duration.

The command I am using is :

dsbulk unload -maxErrors -1 -h ‘[“ < My Host > ”]’ -port 9042 -u < My user name >
-p < Password > -k < Key Space > -t < My Table > -url < My Table > 
--dsbulk.executor.continuousPaging.enabled false --datastax-java-driver.basic.request.page-size 1000 
--dsbulk.engine.maxConcurrentQueries 128 --driver.advanced.retry-policy.max-retries 100000

After a lot of Trial and Error, we found out the problem was with Kubernetes Cassandra pods using the main server's memory size as Max Direct Memory Size , rather than using the pods max assigned Ram.

The pods were assigned 120 GB of Ram, but Cassandra on each pod was assigning 185 GB Ram to file_cache_size , which made the unloading process fails as Kubernetes was rebooting each Pod that utilises Ram more than 120 GB.

The reason is that Max Direct Memory Size is calculated as:

Max direct memory = ((system memory - JVM heap size))/2

And each pod was using 325 GB as Max Direct Memory Size and each pods file_cache_size sets automatically to be half of Max Direct Memory Size value, So whenever a pod requests for memory more than 120 GB Kubernetes will restart it.

The solution to it was to set Max Direct Memory Size as an env variable in Kubernetes cluster's yaml file with a default value or to override it by setting the file_cache_size value on each pod's Cassandra yaml's file

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM