NoHostAvailableException使用Cassandra和DataStax Java驱动程序如果是大ResultSet

Question

The setup: 设置：

2-node Cassandra 1.2.6 cluster 2节点Cassandra 1.2.6集群
replicas=2 复制品= 2
very large CQL3 table with no secondary index 非常大的CQL3表，没有二级索引
Rowkey is a UUID.randomUUID().toString() Rowkey是一个UUID.randomUUID（）。toString（）
read consistency set to ONE 读取一致性设置为ONE
Using DataStax java driver 1.0 使用DataStax java驱动程序1.0

The request: 请求：

Attempting to do a table scan by " SELECT some-col from schema.table LIMIT nnn; " 尝试通过“ 从schema.table LIMIT nnn中选择some-col; ”进行表扫描。

The fail: 失败：

Once I go beyond a certain nnn LIMIT, I start to get NoHostAvailableExceptions from the driver. 一旦我超越了某个nnn LIMIT，我开始从驱动程序中获取NoHostAvailableExceptions。

It reads like this: 它看起来像这样：

com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.181.13.239 ([/10.181.13.239] Unexpected exception triggered))
            at com.datastax.driver.core.exceptions.NoHostAvailableException.copy(NoHostAvailableException.java:64)
            at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:214)
            at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:169)
            at com.jpmc.es.rtm.storage.impl.EventExtract.main(EventExtract.java:36)
            at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
            at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
            at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
            at java.lang.reflect.Method.invoke(Method.java:601)
            at com.intellij.rt.execution.application.AppMain.main(AppMain.java:120)
Caused by: com.datastax.driver.core.exceptions.NoHostAvailableException: All host(s) tried for query failed (tried: /10.181.13.239 ([/10.181.13.239] Unexpected exception triggered))
            at com.datastax.driver.core.RequestHandler.sendRequest(RequestHandler.java:98)
            at com.datastax.driver.core.RequestHandler$1.run(RequestHandler.java:165)
            at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
            at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)

Given: This is probably not the most enlightened thing to do to a large table with millions of rows, but this is how I learn what not to do, so I would really appreciate someone who could volunteer how this kind of error can be debugged. 鉴于：对于拥有数百万行的大型表，这可能不是最开明的事情，但这就是我学习不该做什么的方法，所以我非常感谢能够自愿调试这种错误的人。

For example, when this happens, there are no indications that the nodes in the cluster ever had an issue with the request (there is nothing in the logs on either node that indicate any timeout or failure). 例如，当发生这种情况时，没有迹象表明集群中的节点曾经遇到过请求的问题（任一节点上的日志中都没有指示任何超时或故障的内容）。 Also, I enabled the trace on the driver, which gives you some nice autotrace (ala Oracle) info as long as the query succeeds. 此外，我在驱动程序上启用了跟踪，只要查询成功，就会为您提供一些不错的autotrace（ala Oracle）信息。 But in this case, the driver blows a NoHostAvailableException and no ExecutionInfo is available, so tracing has not provided any benefit in this case. 但在这种情况下，驱动程序会导致NoHostAvailableException并且没有ExecutionInfo可用，因此在这种情况下，跟踪没有提供任何好处。

I also find it interesting that this does not seem to be recorded as a timeout (my JMX consoles tell me no timeouts have occurred). 我也觉得有趣的是，这似乎没有记录为超时（我的JMX控制台告诉我没有发生超时）。 So, I am left not understanding WHERE the failure is actually occurring. 所以，我不理解故障实际发生的地方。 I am left with the idea that it is the driver that is having a problem, but I don't know how to debug it (and I would really like to). 我有一个想法，那就是驱动程序有问题，但我不知道如何调试它（我真的很想）。

I have read several posts from folks that state that query'g for resultSets > 10000 rows is probably not a good idea, and I am willing to accept this, but I would like to understand what is causing the exception and where the exception is happening. 我已经阅读了几个人的帖子，他们声明查询结果集> 10000行可能不是一个好主意，我愿意接受这个，但我想了解导致异常的原因以及发生异常的地方。

FWIW, I also tried bumping the timeout properties in the cassandra.yaml, but this made no difference whatsoever. FWIW，我也尝试在cassandra.yaml中碰撞超时属性，但这没有任何区别。

I welcome any suggestions, anecdotes, insults, or monetary contributions for my registration in the house of moron-developers. 我欢迎任何建议，轶事，侮辱或金钱捐助，我在白痴开发商的房子里注册。

Regards!! 问候！！

Answer 1

My guess (and perhaps others can confirm) is that you are putting too high a load on the cluster by the query which is causing the timeout. 我的猜测（也许还有其他人可以证实）是你通过导致超时的查询对集群施加过高的负载。 So, yes, it's a little difficult to debug as it's not obvious what the root cause was: was the limit I set too large or is the cluster actually down? 所以，是的，它有点难以调试，因为根本原因并不明显：我设置的限制是否过大或者群集实际上是否已经下降？

You want to avoid setting large limits on the amount of data you request in a single query, typically by setting a reasonable limit and paging through the results, eg, 您希望避免在单个查询中对请求的数据量设置大的限制，通常是通过设置合理的限制并对结果进行分页，例如，

SELECT * FROM messages WHERE user_id = 101 LIMIT 1000;
SELECT * FROM messages WHERE user_id = 101 AND msg_id > [Last message ID received] LIMIT 1000;

The Automatic Paging functionality added in ( see this document , where the code examples in this answer are copied from) is a big improvement in datastax java-driver as it removes the need to manually page and lets you do the following: 添加的自动分页功能（请参阅本文档，其中复制了本答案中的代码示例）是对数据流java驱动程序的重大改进，因为它不需要手动分页，并允许您执行以下操作：

Statement stmt = new SimpleStatement("SELECT * FROM images");
stmt.setFetchSize(100);
ResultSet rs = session.execute(stmt);

// Iterate over the ResultSet here

While this won't necessarily solve your problem it will minimise the possibility that it was a "too-big" query. 虽然这不一定能解决您的问题，但它会最大限度地减少查询“过大”的可能性。

NoHostAvailableException使用Cassandra和DataStax Java驱动程序如果是大ResultSet

问题描述

The setup: 设置：

The request: 请求：

The fail: 失败：

1 个解决方案

解决方案1
2 已采纳 2013-10-22 21:51:36

NoHostAvailableException使用Cassandra和DataStax Java驱动程序如果是大ResultSet

问题描述

The setup: 设置：

The request: 请求：

The fail: 失败：

1 个解决方案

解决方案1 2 已采纳 2013-10-22 21:51:36

解决方案1
2 已采纳 2013-10-22 21:51:36