在一致性ONE的读取查询期间的Cassandra超时（需要1个响应但仅响应0个副本）

Question

I am doing read and update queries on a table having 500000 rows and some times getting below error after processing around 300000 rows, even when no node is down. 我正在对具有500000行的表进行读取和更新查询，并且在处理大约300000行之后有时会低于错误，即使没有节点关闭也是如此。

Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) 在一致性ONE的读取查询期间的Cassandra超时（需要1个响应但仅响应0个副本）

Infrastructure details: 基建细节：
Having 5 Cassandra nodes, 5 spark and 3 Hadoop nodes each with 8 cores and 28 GB memory and Cassandra replication factor is 3 . 拥有5个Cassandra节点，5个spark和3个Hadoop节点，每个节点有8个内核和28 GB内存，Cassandra 复制因子为3 。

Cassandra configuration: Cassandra配置：

read_request_timeout_in_ms (ms): 10000
range_request_timeout_in_ms (ms): 10000
write_request_timeout_in_ms (ms): 5000
cas_contention_timeout_in_ms (ms): 1000 
truncate_request_timeout_in_ms (ms): 60000
request_timeout_in_ms (ms): 10000.

I have tried the same job by increasing read_request_timeout_in_ms (ms) to 20,000 as well but it didn't help. 我通过将read_request_timeout_in_ms （ms）增加到20,000来尝试相同的工作，但它没有帮助。

I am doing queries on two tables. 我正在对两张桌子进行查询。 Below is the create statement for one of the tables: 下面是其中一个表的create语句：

Create Table: 创建表：

CREATE TABLE section_ks.testproblem_section (
    problem_uuid text PRIMARY KEY,
    documentation_date timestamp,
    mapped_code_system text,
    mapped_problem_code text,
    mapped_problem_text text,
    mapped_problem_type_code text,
    mapped_problem_type_text text,
    negation_ind text,
    patient_id text,
    practice_uid text,
    problem_category text,
    problem_code text,
    problem_comment text,
    problem_health_status_code text,
    problem_health_status_text text,
    problem_onset_date timestamp,
    problem_resolution_date timestamp,
    problem_status_code text,
    problem_status_text text,
    problem_text text,
    problem_type_code text,
    problem_type_text text,
    target_site_code text,
    target_site_text text
    ) WITH bloom_filter_fp_chance = 0.01
    AND caching = '{"keys":"ALL", "rows_per_partition":"NONE"}'
    AND comment = ''
    AND compaction = {'class': 
    'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy'}
    AND compression = {'sstable_compression': 
    'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99.0PERCENTILE';

Queries : 查询：

1) SELECT encounter_uuid, encounter_start_date FROM section_ks.encounters WHERE patient_id = '1234' AND encounter_start_date >= '" + formatted_documentation_date + "' ALLOW FILTERING; 1） SELECT encounter_uuid, encounter_start_date FROM section_ks.encounters WHERE patient_id = '1234' AND encounter_start_date >= '" + formatted_documentation_date + "' ALLOW FILTERING;

2) UPDATE section_ks.encounters SET testproblem_uuid_set = testproblem_uuid_set + {'1256'} WHERE encounter_uuid = 'abcd345'; 2） UPDATE section_ks.encounters SET testproblem_uuid_set = testproblem_uuid_set + {'1256'} WHERE encounter_uuid = 'abcd345';

Answer 1

Usually when you get a timeout error it means you are trying to do something that isn't scaling well in Cassandra. 通常，当您收到超时错误时，这意味着您正在尝试执行在Cassandra中无法正常扩展的操作。 The fix is often to modify your schema. 修复通常是修改您的架构。

I suggest you monitor the nodes while running your query to see if you can spot the problem area. 我建议您在运行查询时监视节点，看看是否可以发现问题区域。 For example, you can run "watch -n 1 nodetool tpstats" to see if any queues are backing up or dropping items. 例如，您可以运行“watch -n 1 nodetool tpstats”来查看是否有任何队列正在备份或删除项目。 See other monitoring suggestions here . 请在此处查看其他监控建议

One thing that might be off in your configuration is that you say you have five Cassandra nodes, but only 3 spark workers (or are you saying you have three spark workers on each Cassandra node?) You'll want at least one spark worker on each Cassandra node so that loading data into spark is done locally on each node and not over the network. 在您的配置中可能有一件事情就是你说你有五个Cassandra节点，但只有3个火花工人（或者你是说你在每个Cassandra节点上都有三个火花工人？）你至少需要一个火花工人每个Cassandra节点，以便将数据加载到spark中，在每个节点上本地完成，而不是通过网络完成。

It's hard to tell much more than that without seeing your schema and the query you are running. 如果没有看到您正在运行的架构和查询，就很难说清楚。 Are you reading from a single partition? 你在阅读单个分区吗？ I started getting timeout errors in the vicinity of 300,000 rows when reading from a single partition. 从单个分区读取时，我开始在300,000行附近发生超时错误。 See question here . 在这里查看问题。 The only workaround I have found so far is to use a client side hash in my partition key to break the partitions up into smaller chunks of around 100K rows. 到目前为止，我发现的唯一解决方法是在我的分区键中使用客户端哈希将分区分成大约100K行的较小块。 So far I have not found a way to tell Cassandra to not timeout for a query that I expect to take a long time. 到目前为止，我还没有找到一种方法告诉Cassandra没有超时的查询，我希望需要很长时间。

Answer 2

Don't think configuration is a root cause, but data model issue. 不要认为配置是根本原因，而是数据模型问题。

It would be cool to see a structure of section_ks.encounters table. 看到section_ks.encounters表的结构会很酷。

Suggested to think carefully about what concrete queries expected to run before design table(s) structure. 建议仔细考虑在设计表结构之前预期要运行的具体查询。

As far as I see, those two queries expects different structure of section_ks.encounters to run them with good performance. 据我所知，这两个查询期望section_ks.encounters的不同结构以良好的性能运行它们。

Let's review each provided query and try to design tables: 让我们回顾一下每个提供的查询并尝试设计表：

First one: 第一：

SELECT encounter_uuid, encounter_start_date FROM section_ks.encounters WHERE patient_id = '1234' AND encounter_start_date >= '" + formatted_documentation_date + "' ALLOW FILTERING; SELECT encounter_uuid，encounter_start_date FROM section_ks.encounters WHERE patient_id ='1234'ANDy_start_date> ='“+ formatted_documentation_date +”'ALLOW FILTERING;

First point, if Cassandra forces you to add ALLOW FILTERING, that is a symhtome of non-optimal query or table structure. 第一点，如果Cassandra强迫你添加ALLOW FILTERING，那就是非最佳查询或表结构的符号。
Second point. 第二点。 Primary key. 首要的关键。 An awesome explanation about what are primary keys in Cassandra Given query would work fast & without mandatory ALLOW FILTERING statement if patient_id column and encounter_start_date column would form a composite primary key. 如果patient_id列和encounter_start_date列将形成复合主键，那么关于什么是Cassandra Given查询中的主键的一个很棒的解释将快速并且没有强制ALLOW FILTERING语句。 Enumerating of columns inside PRIMARY KEY() statement should correspond to order of filtering in your query. 枚举PRIMARY KEY（）语句中的列应该对应于查询中的过滤顺序。
Why ALLOW FILTERING mandatory in original query? 为什么在原始查询中必须允许过滤？ By partition key Cassandra knows on which node data located. 通过分区密钥，Cassandra知道哪个节点数据位于何处。 In case when patient_id column is not partition key, Cassandra had to scan all 5 nodes for find requested patient. 如果patient_id列不是分区键，Cassandra必须扫描所有5个节点以查找请求的患者。 When we have a lot of data across nodes, such full scan usually fails by timeout. 当我们跨节点有大量数据时，这种全扫描通常会因超时而失败。

Here is an example of table structure fits effectively with given query: 以下是表结构与给定查询有效匹配的示例：

create table section_ks.encounters(
    patient_id bigint, 
    encounter_start_date timestamp, 
    encounter_uuid text,
    some_other_non_unique_column text,
    PRIMARY KEY (patient_id, encounter_start_date)
);

patient_id column would be a "partition key". patient_id列将是“分区键”。 Responsible for data distribution across Cassandra nodes. 负责跨Cassandra节点的数据分发。 In simple words(omitting replication feature): different ranges of patients would be stored on different nodes. 简单来说（省略复制功能）：不同范围的患者将存储在不同的节点上。
encounter_start_date column would be a "clustering key" Responsible for data sorting inside partition. encounter_start_date列将是一个“集群密钥”负责分区内的数据排序。

ALLOW FILTERING now can be removed from query: 现在可以从查询中删除允许过滤：

SELECT encounter_uuid, encounter_start_date 
FROM section_ks.encounters 
WHERE patient_id = '1234' AND encounter_start_date >= '2017-08-19';

Second query: 第二个查询：

UPDATE section_ks.encounters SET testproblem_uuid_set = testproblem_uuid_set + {'1256'} WHERE encounter_uuid = 'abcd345'; UPDATE section_ks.encounters SET testproblem_uuid_set = testproblem_uuid_set + {'1256'} WHERE encounter_uuid ='abcd345';

Table structure should look like close to: 表结构应该看起来像接近：

create table section_ks.encounters(
    encounter_uuid text, -- partition key
    patient_id bigint,
    testproblem_uuid_set text, 
    some_other_non_unique_column text,
    PRIMARY KEY (encounter_uuid)
);

If we definitively would like to make a quick filtering only by encounter_uuid , it should be defined as partition key. 如果我们明确希望仅通过encounter_uuid进行快速过滤，则应将其定义为分区键。

Good articles about designing of effective data model: 关于有效数据模型设计的好文章：

在一致性ONE的读取查询期间的Cassandra超时（需要1个响应但仅响应0个副本）

问题描述

2 个解决方案

解决方案1
9 已采纳 2015-09-01 20:47:36

解决方案2
-1 2017-08-19 17:05:02

在一致性ONE的读取查询期间的Cassandra超时（需要1个响应但仅响应0个副本）

问题描述

2 个解决方案

解决方案1 9 已采纳 2015-09-01 20:47:36

解决方案2 -1 2017-08-19 17:05:02

解决方案1
9 已采纳 2015-09-01 20:47:36

解决方案2
-1 2017-08-19 17:05:02