简体   繁体   中英

Cassandra integration with hadoop for read performance

I am using Apache Cassandra for storing around 100 million records. There is one single node with the following specifications-

RAM-32GB, HDD-2TB, Intel quad core processor.

With cassandra there is a read performance problem. For some queries it takes around 40mins for giving the output. After searching for how to improve the read performance i came to know about the following factors-

Compaction strategy,compression techniques, key cache, increase the heap space, turning off the swap space for cassandra.

After doing these optimizations, the performance remains the same. After seraching, I came around for integrating Hadoop with cassandra.Is it the correct way to do the queries in cassandra or any other factors I am missing here?? Thanks.

It looks like you data model could be improved. 40 minutes is something impossible. I download all data from 6 million records (around 10gb) within few minutes. And think it because I convert data in the process of download and store them. Trivial selects must take milliseconds.

Did you build it on the base of queries that you must do ?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM