简体繁体中英

Regarding Cassandra Read Performance

原文 2016-04-23 07:59:29 1 2 java/ cassandra

I am working on a sensor data(timeseries). Number of columns in a table is 3000.
for eg: nodeid,timestamp,sen1,sen2,.....sen-n . nodeid and timestamp are primary key with clustering order by timestamp.
Number of records are 10000.

When a SELECT query for single column( SELECT timestamp,sen1 FROM <table> ) requ is requsted through cassandra datastax java driver 3.0 it is replies in 15 sec; ie If I want read all the tags, one a tag at a time for all 3000 tags requires 3000*15 sec = 12 to 13 hours aproximately. It is on single node cluster with 16GB RAM.
I allocated 10GB for JVM. Still response time is not changed. I used LevelCompactionStragy at the time of table creation.

Hardware: Intel Core i7 and Normal Hard disk not SSD,8GB RAM How to reduce that read or query time on the single node cluster?

2 answers

Obviously, there is problem with data modelling. IMO, a table with 3000 columns is bad. if your use case is like "SELECT timestamp,sen1 FROM ", then you should model it as " Primary Key(Timestamp, SensorId) ".

"SELECT timestamp,sen1" in your model, cassandra will still read all other column values from disk into memory.

I am not sure what is 'nodeId' in your case.. I hope it's not cassandra node id..

(SELECT timestamp,sen1 FROM table)

This is like getting all the data at once(in your case 10000 records). So getting 1 column or 3000 columns will make Cassandra server to read through all the SSTables. The point is it won't be 12 or 13 hours.

Still 15 seconds seems unbelievable. Did you also include the network latency and client side write in this measure?

As mentioned in one of the answers your model seems to be bad (If you put timestamp as partion key, the data becomes two sparse and getting a range of data will need to read from more than one partition. If you use only node_id as partition key, the partition will host too much data and can cross the C* limitation of 2 Billion). My advise is

Redesign your partition key. Please check this tutorial for a start. https://academy.datastax.com/resources/getting-started-time-series-data-modeling
Add more no. of nodes and increase replication factor to see better read latencies.
Try to design your read query such that it reads from only one partition at once. eg: SELECT * from Table where sensor_node_id = abc and year = 2016 and month = June

Hope this helps!

Cassandra Read/Get Performance

Cassandra read performance with Astyanax client

Throughput vs replication factor on the read performance of cassandra

Cassandra Read/Write performance - High CPU

Cassandra read performance almost a constant with replication

Cassandra; best practice regarding Indexes?

Cassandra newbie question regarding clusters

Cassandra blob performance

Cassandra cluster poor performance

Low write performance of Cassandra

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Cassandra Read/Get Performance Cassandra read performance with Astyanax client Throughput vs replication factor on the read performance of cassandra Cassandra Read/Write performance - High CPU Cassandra read performance almost a constant with replication Cassandra; best practice regarding Indexes? Cassandra newbie question regarding clusters Cassandra blob performance Cassandra cluster poor performance Low write performance of Cassandra

Related Tags

Regarding Cassandra Read Performance

Question

2 answers

solution1
0 2016-04-23 12:53:09

solution2
0 2016-04-23 14:52:18

Regarding Cassandra Read Performance

Question

2 answers

solution1 0 2016-04-23 12:53:09

solution2 0 2016-04-23 14:52:18

solution1
0 2016-04-23 12:53:09

solution2
0 2016-04-23 14:52:18