[英]Cassandra timeout during read query at consistency ONE/LOCAL_QUORUM
Table Structure表结构
CREATE TABLE tablename(
col1 text,
col2 text,
col3 timestamp,
col4 timestamp,
col5 text,
col6 timestamp,
.
.
PRIMARY KEY (col5, col6))
WITH CLUSTERING ORDER BY (col6 DESC)
CREATE CUSTOM INDEX indexname on tablename (col1) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX indexname on tablename (col2) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX indexname on tablename (col3) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX indexname on tablename (col4) USING 'StorageAttachedIndex';
CREATE CUSTOM INDEX indexname on tablename (col6) USING 'StorageAttachedIndex';
Read Query:阅读查询:
select col1, col2, col3, col4, col.... from tablename
where col1='text'
and col2='text'
and col3>'timestamp'
and col4>='timestamp'
and col4<='timestamp'
PER PARTITION LIMIT 1;
In Java, I have written a code to execute a query to fetch 100,000 records with below config:在 Java 中,我编写了一个代码来执行查询以使用以下配置获取 100,000 条记录:
When I run the code, it works perfectly and responding in around 1 min 20 sec for 100,000 rows.当我运行代码时,它完美地工作并在大约 1 分 20 秒内响应 100,000 行。
But when I try to run in more than 2 windows parallelly, then only one window showing the result and other windows throwing timeout error.但是当我尝试在 2 个以上的窗口中并行运行时,只有一个窗口显示结果,其他窗口抛出超时错误。
Cassandra timeout during read query at consistency ONE一致性 ONE 读取查询期间的 Cassandra 超时
When I run the code, it works perfectly and responding in around 1 min 20 sec当我运行代码时,它运行良好并在大约 1 分 20 秒内响应
TBH I'm surprised this returns a result set at all. TBH 我很惊讶这会返回一个结果集。 Cassandra was not designed to support OLAP or queries requiring filtering on many different columns. Cassandra 并非旨在支持 OLAP 或需要对许多不同列进行过滤的查询。
The reason it's timing out, is that queries based on a secondary index (or multiple indexes, in this case) put extra stress on one node.它超时的原因是基于二级索引(或多个索引,在这种情况下)的查询给一个节点带来了额外的压力。 When they run, a "coordinator" node is selected.当它们运行时,会选择一个“协调器”节点。 That node is then responsible for pulling data from all of the other nodes and assembling the result set (in RAM).然后该节点负责从所有其他节点提取数据并组装结果集(在 RAM 中)。
The default timeouts are set with the specific intent of stopping queries like this, because they can (and often do) cause nodes to crash.默认超时设置的特定目的是停止这样的查询,因为它们可能(并且经常会)导致节点崩溃。 I imagine that supporting two similar queries in parallel is too much for the cluster to handle.我想并行支持两个类似的查询对于集群来说太多了。
The way around this, is to ensure that your queries are always filtering on a partition key ( col5
in this case).解决这个问题的方法是确保您的查询始终过滤分区键(在本例中为col5
)。 Single partition queries ensure that only a single node will be queried.单分区查询确保只查询一个节点。 That's why the idea with Cassandra is to build your tables around the intended queries.这就是为什么 Cassandra 的想法是围绕预期查询构建表。 In this case, building a query table with partition keys of col1
and col2
would help to ensure that.在这种情况下,使用col1
和col2
分区键构建查询表将有助于确保这一点。 Adding clustering keys of col3
and col4
will help for your other conditions:添加col3
和col4
聚类键将有助于您的其他条件:
PRIMARY KEY ((col1, col2),col3,col4)
Of course, I'm building that definition without an understanding of the cardinality of col1
or col2
.当然,我是在不了解col1
或col2
的基数的情况下构建该定义的。 As Cassandra has a partition limit of 2GB and 2 billion cells, it's always a good idea to keep your partition sizes much lower than that.由于 Cassandra 的分区限制为 2GB 和 20 亿个单元,因此将分区大小保持在远低于该值总是一个好主意。 In which case, an additional partition key and running more than one query for smaller parts of the data set would be the way to go.在这种情况下,一个额外的分区键并对数据集的较小部分运行多个查询将是可行的方法。
I recommend checking out DataStax Academy , specifically the (free) course DS220 on Data Modeling.我建议查看DataStax Academy ,特别是有关数据建模的(免费)课程 DS220。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.