简体繁体中英

Cassandra timeout during read query at consistency LOCAL_ONE

原文 2015-12-11 07:04:11 6 1 apache-spark/ cassandra/ spark-cassandra-connector

I have a single-node cassandra cluster, 32 cores CPU, 32GB memory and RAID of 3 SSDs, totally around 2.5TB. and i also have another host with 32 cores and 32GB memory, on which i run a Apache Spark.

I have a huge history data in cassandra, maybe 600GB. There're approx more than 1 million new records every day which come from Kafka. And I need to query these new rows every day. But Cassandra failed. I'm confused.

My scheme of Cassandra table is:

CREATE TABLE rainbow.activate ( rowkey text, qualifier text, act_date text, info text, log_time text, PRIMARY KEY (rowkey, qualifier) ) WITH CLUSTERING ORDER BY (qualifier ASC) AND bloom_filter_fp_chance = 0.01 AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}' AND comment = '' AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'} AND compression = {'sstable_compression': 'org.apache.cassandra.io.compress.LZ4Compressor'} AND dclocal_read_repair_chance = 0.1 AND default_time_to_live = 0 AND gc_grace_seconds = 864000 AND max_index_interval = 2048 AND memtable_flush_period_in_ms = 0 AND min_index_interval = 128 AND read_repair_chance = 0.0 AND speculative_retry = '99.0PERCENTILE'; CREATE INDEX activate_act_date_idx ON rainbow.activate (act_date); CREATE INDEX activate_log_time_idx ON rainbow.activate (log_time);

cause the source data maybe contains some duplicative data, so i need to use a primary key to drop the duplicative records. there're two index on this table, the act_date is a date string like '20151211', the log_time is a datetime string like '201512111452', that is the log_time separates records more finer.

if i select records using log_time, cassandra works. but it fails using act_date.

at the first, spark job exit with an error:

java.io.IOException: Exception during execution of SELECT "rowkey", "qualifier", "info" FROM "rainbow"."activate" WHERE token("rowkey") > ? AND token("rowkey") <= ? AND log_time = ? ALLOW FILTERING: All host(s) tried for query failed (tried: noah-cass01/192.168.1.124:9042 (com.datastax.driver.core.OperationTimedOutException: [noah-cass01/192.168.1.124:9042] Operation timed out))

i try to increase the spark.cassandra.read.timeout_ms to 60000. But the job post another error as follow:

java.io.IOException: Exception during execution of SELECT "rowkey", "qualifier", "info" FROM "rainbow"."activate" WHERE token("rowkey") > ? AND token("rowkey") <= ? AND act_date = ? ALLOW FILTERING: Cassandra timeout during read query at consistency LOCAL_ONE (1 responses were required but only 0 replica responded)

i don't know how to solve this problem, i read the docs on spark-cassandra-connector but i don't find any tips.

so would you like to give some advise to help me solve this problem.

thanks very much!

1 answers

Sounds like an unusual setup. If you have two machines it would be more efficient to configure Cassandra as two nodes and run Spark on both nodes. That would spread the data load and you'd generate a lot less traffic between the two machines.

Ingesting so much data every day and then querying arbitrary ranges of it sounds like a ticking time bomb. When you start getting frequent time out errors it is usually a sign of an inefficient schema where Cassandra cannot do what you are asking in an efficient way.

I don't see the specific cause of the problem, but I'd consider adding another field to the partition key, such as the day so that you could restrict your queries to a smaller subset of your data.

Cassandra timeout during SIMPLE write query at consistency LOCAL_ONE

Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)

Unable to find the count(*) of cassandra table throwing consistency LOCAL_ONE (1 responses were required but only 0 replica responded)

InvalidQueryException: Consistency level LOCAL_ONE is not supported for this operation. Supported consistency levels are: LOCAL_QUORUM

How to set consistency level ONE and ifNotExists in spark cassandra DataFrame

How to make workers to query only local cassandra nodes?

Is it possible to read all rows of Cassandra partition in one Spark worker?

Cassandra Query column mapping

Cassandra query on 2 dates

Spark Cassandra Iterative Query

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Cassandra timeout during SIMPLE write query at consistency LOCAL_ONE Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) Unable to find the count(*) of cassandra table throwing consistency LOCAL_ONE (1 responses were required but only 0 replica responded) InvalidQueryException: Consistency level LOCAL_ONE is not supported for this operation. Supported consistency levels are: LOCAL_QUORUM How to set consistency level ONE and ifNotExists in spark cassandra DataFrame How to make workers to query only local cassandra nodes? Is it possible to read all rows of Cassandra partition in one Spark worker? Cassandra Query column mapping Cassandra query on 2 dates Spark Cassandra Iterative Query

Related Tags

Cassandra timeout during read query at consistency LOCAL_ONE

Question

1 answers

solution1 0 2015-12-13 02:52:07

solution1
0 2015-12-13 02:52:07