带分页的Datastax Cassandra Java驱动程序RetryPolicy

Question

I'm running a query that fetches millions of rows (5.000.000 or so). 我正在运行一个获取数百万行（5.000.000左右）的查询。 My nodes seem to be quite busy, as the coordinator returns a com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded) exception. 我的节点似乎很忙，因为协调器返回com.datastax.driver.core.exceptions.ReadTimeoutException: Cassandra timeout during read query at consistency ONE (1 responses were required but only 0 replica responded)异常。 (I don't really know if the nodes are busy or something else is going on). （我真的不知道节点是否忙或发生了其他事情）。

So far I've tried setting a higher read_request_timeout_in_millis in every Cassandra node, and executing the query like this 到目前为止，我已经尝试在每个Cassandra节点中设置更高的read_request_timeout_in_millis，并像这样执行查询

new SimpleStatement("SELECT * FROM where date = ? ",param1)
    .setFetchSize(pageSize).setConsistencyLevel(ConsistencyLevel.ONE)
    .setReadTimeoutMillis(ONE_DAY_IN_MILLIS);
ResultSet resultSet = this.session.execute(statement);

But the exception is still being thrown. 但是仍在抛出异常。 My next move is to try a custom RetryPolicy, but can someone tell me if a readTimeout retry will execute the whole query again or will retry from the current page that failed? 我的下一步是尝试自定义RetryPolicy，但是有人可以告诉我readTimeout重试是否将再次执行整个查询，还是将从当前失败的页面重试？

I was trying something like this: 我正在尝试这样的事情：

@Override
public RetryDecision onReadTimeout(Statement statement, ConsistencyLevel cl, int requiredResponses, int receivedResponses, boolean dataRetrieved, int nbRetry) {
    if (dataRetrieved) {
        return RetryDecision.ignore();
    } else if (nbRetry < readRetries) {
        LOGGER.info("Retry attemp {} out of {} ",nbRetry,readRetries);
        return RetryDecision.retry(cl);
    } else {
        return RetryDecision.rethrow();
    }
}

where readReatries is the number of retries that I will attemp to fetch the data. 其中readReatries是我将尝试获取数据的重试次数。

Answer 1

When you use fetch size on query driver will never issue whole query up front. 当您在查询驱动程序上使用访存大小时，永远不会预先发出整个查询。 Even when you do not specify fetch size driver will use 5000 as fetch size to prevent overloading the memory with many objects. 即使未指定访存大小，驱动程序也将使用5000作为访存大小，以防止许多对象使内存过载。 What is happening, is that chunk of results are fetched by issuing query with limit and while you iterate over results, when you get to end of chunk driver will issue query for following number of results and so on. 发生的情况是，通过发出具有限制的查询来获取结果块，并且在对结果进行迭代时，当到达块末尾时，驱动程序将针对以下结果数发出查询，依此类推。 All in all if result number is bigger that fetch size multiple queries will get issued from driver to cluster. 总而言之，如果结果数大于获取大小，则会从驱动程序向集群发出多个查询。 Nice sequence diagram along with other explanations can be seen on official datastax driver page . 可以在官方的datastax驱动程序页面上看到漂亮的序列图以及其他说明。

That being said RetryPolicy works on single statement, and does not know nothing about fetch size, so that statement will get retried number of times you define (meaning only that chunk will get retried on timeout). 话虽这么说， RetryPolicy对单个语句起作用，并且对获取大小一无所知，所以该语句将获得重定义的次数（这意味着超时时将仅重试该块）。

带分页的Datastax Cassandra Java驱动程序RetryPolicy

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-08-18 08:35:21

带分页的Datastax Cassandra Java驱动程序RetryPolicy

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-08-18 08:35:21

解决方案1
3 已采纳 2016-08-18 08:35:21