简体   繁体   中英

Cassandra timeout during read query at consistency LOCAL_ONE

I have a single-node cassandra cluster, 32 cores CPU, 32GB memory and RAID of 3 SSDs, totally around 2.5TB. and i also have another host with 32 cores and 32GB memory, on which i run a Apache Spark.

I have a huge history data in cassandra, maybe 600GB. There're approx more than 1 million new records every day which come from Kafka. And I need to query these new rows every day. But Cassandra failed. I'm confused.

My scheme of Cassandra table is:

CREATE TABLE rainbow.activate ( rowkey text, qualifier text, act_date text, info text, log_time text, PRIMARY KEY (rowkey, qualifier) ) WITH CLUSTERING ORDER BY (qualifier ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; CREATE INDEX activate_act_date_idx ON rainbow.activate (act_date); CREATE INDEX activate_log_time_idx ON rainbow.activate (log_time);

cause the source data maybe contains some duplicative data, so i need to use a primary key to drop the duplicative records. there're two index on this table, the act_date is a date string like '20151211', the log_time is a datetime string like '201512111452', that is the log_time separates records more finer.

if i select records using log_time, cassandra works. but it fails using act_date.

at the first, spark job exit with an error:

java.io.IOException: Exception during execution of SELECT "rowkey", "qualifier", "info" FROM "rainbow"."activate" WHERE token("rowkey") > ? AND token("rowkey") <= ? AND log_time = ? ALLOW FILTERING: All host(s) tried for query failed (tried: noah-cass01/192.168.1.124:9042 (com.datastax.driver.core.OperationTimedOutException: [noah-cass01/192.168.1.124:9042] Operation timed out))

i try to increase the spark.cassandra.read.timeout_ms to 60000. But the job post another error as follow:

java.io.IOException: Exception during execution of SELECT "rowkey", "qualifier", "info" FROM "rainbow"."activate" WHERE token("rowkey") > ? AND token("rowkey") <= ? AND act_date = ? ALLOW FILTERING: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)

i don't know how to solve this problem, i read the docs on spark-cassandra-connector but i don't find any tips.

so would you like to give some advise to help me solve this problem.

thanks very much!

Sounds like an unusual setup. If you have two machines it would be more efficient to configure Cassandra as two nodes and run Spark on both nodes. That would spread the data load and you'd generate a lot less traffic between the two machines.

Ingesting so much data every day and then querying arbitrary ranges of it sounds like a ticking time bomb. When you start getting frequent time out errors it is usually a sign of an inefficient schema where Cassandra cannot do what you are asking in an efficient way.

I don't see the specific cause of the problem, but I'd consider adding another field to the partition key, such as the day so that you could restrict your queries to a smaller subset of your data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM