获得Cassandra Writes背压的最佳方法是什么？

Question

I have a service that consumes messages off of a queue at a rate that I control. 我有一个服务，以我控制的速率消耗队列中的消息。 I do some processing and then attempt to write to a Cassandra cluster via the Datastax Java client. 我做了一些处理，然后尝试通过Datastax Java客户端写入Cassandra集群。 I have setup my Cassandra cluster with maxRequestsPerConnection and maxConnectionsPerHost . 我已经使用maxRequestsPerConnection和maxConnectionsPerHost设置了我的Cassandra集群。 However, in testing I have found that when I have reached maxConnectionsPerHost and maxRequestsPerConnection calls to session.executeAsync don't block. 但是，在测试中我发现当我到达maxConnectionsPerHost并且对session.executeAsync maxRequestsPerConnection调用时不会阻塞。

What I am doing right now is using a new Semaphore(maxConnectionsPerHost * maxRequestsPerConnection) and incrementing it before every async request and decrementing it when the future returned by executeAsync completes. 我现在正在做的是使用new Semaphore(maxConnectionsPerHost * maxRequestsPerConnection)并在每个异步请求之前递增它，并在executeAsync返回的未来完成时递减它。 This works well enough, but it seems redundant since the driver is already tracking requests and connections internally. 这很好用，但由于驱动程序已在内部跟踪请求和连接，因此它似乎是多余的。

Has anyone come up with a better solution to this problem? 有没有人想出更好的解决方案来解决这个问题？

One caveat: I would like a request to be considered outstanding until it has completed. 一个警告：我希望在完成之前将其视为未完成的请求。 This includes retries ! 这包括重试 ！ The situation where I am getting retryable failures from the cluster (such as timeouts waiting for consistency) is primary situation where I want to backpressure and stop consuming messages from the queue. 我从群集中获得可重试失败的情况（例如等待一致性的超时）是我想要反压并停止消耗队列消息的主要情况。

Problem: 问题：

// the rate at which I consume messages depends on how fast this method returns
processMessage(message) {
    // this appears to return immediately even if I have exhausted connections/requests
    session.executeAsync(preparedStatement.bind(...));
}

Current solution: 当前解决方案

constructor() {
    this.concurrentRequestsSemaphore = new Semaphore(maxConnectionsPerHost * maxRequestsPerConnection);
}

processMessage(message) {
    ResultSetFuture resultSetFuture = session.executeAsync(preparedStatement.bind(...));
    CompletableFuture<ResultSet> future = completableFromListenable(resultSetFuture);
    concurrentRequestsSemaphore.acquireUninterruptibly();
    future.whenComplete((result, exception) -> concurrentRequests.release());
}

Also, can anyone see any obvious problems with this solution? 此外，任何人都可以看到此解决方案的任何明显问题？

Answer 1

One possible idea not to kill the cluster is to "throttle" your calls to executeAsync eg after a batch of 100 (or whatever number is the best for your cluster and workload), you'll do a sleep in the client code and do a blocking call on all the 100 futures (or use Guava library to transform a list of future into a future of list) 不杀死集群的一个可能的想法是“限制”对executeAsync的调用，例如在批量100（或者对于您的集群和工作负载最好的数量）之后，您将在客户端代码中进行休眠并执行阻止调用所有100个期货（或使用Guava库将未来列表转换为列表的未来）

This way, after issuing 100 async queries, you'll force the client application to wait for all of them to succeed before proceeding further. 这样，在发出100个异步查询之后，您将强制客户端应用程序在继续进行之前等待所有这些查询成功。 If you catch any exception when calling future.get() , you can schedule a retry. 如果在调用future.get()时捕获到任何异常，则可以安排重试。 Normally the retry is already attempted by the default RetryStrategy of the Java driver. 通常，Java驱动程序的默认RetryStrategy已尝试重试。

About back-pressure signal from server, starting from CQL binary protocol V3, there is an error code that notifies the client that the coordinator is overloaded : https://github.com/apache/cassandra/blob/trunk/doc/native_protocol_v3.spec#L951 关于来自服务器的反压信号，从CQL二进制协议V3开始，有一个错误代码通知客户端协调器过载： https ： //github.com/apache/cassandra/blob/trunk/doc/native_protocol_v3。规格＃L951

From the client, you can get this overloaded information in 2 ways: 从客户端，您可以通过两种方式获取此重载信息：

Java Driver 3.0.0: new OverloadedException class introduced: http://www.datastax.com/dev/blog/datastax-java-driver-3-0-0-released#misc Java Driver 3.0.0：引入了新的OverloadedException类： http ： //www.datastax.com/dev/blog/datastax-java-driver-3-0-0-released#misc
Java Driver before 3.0.0: a DriverException("host overloaded") is thrown 3.0.0之前的Java驱动程序： 抛出DriverException（“主机重载”）

Answer 2

What I am doing right now is using a new Semaphore(maxConnectionsPerHost * maxRequestsPerConnection) and incrementing it before every async request and decrementing it when the future returned by executeAsync completes. 我现在正在做的是使用新的信号量（maxConnectionsPerHost * maxRequestsPerConnection）并在每个异步请求之前递增它，并在executeAsync返回的未来完成时递减它。 This works well enough, but it seems redundant since the driver is already tracking requests and connections internally. 这很好用，但由于驱动程序已在内部跟踪请求和连接，因此它似乎是多余的。

That is a pretty reasonable approach, that allows new requests to fill in while other ones complete. 这是一种非常合理的方法，允许新请求填写，而其他请求完成。 You can tie releasing a permit to the future completion. 您可以将许可证发布到未来完成。

The reason why the driver doesn't do this itself is that it tries to do as little blocking as possible and instead fails fast. 驱动程序本身不这样做的原因是它试图尽可能少地阻塞，而是快速失败。 Unfortunately this pushes some responsibility to the client. 不幸的是，这会给客户带来一些责任。

In the usual case it is not good to send that many requests simultaneously to a host at a time. 在通常情况下，一次将多个请求同时发送给主机是不好的。 C* has a native_transport_max_threads setting (default 128) that controls the number of threads handling requests at a time. C *具有native_transport_max_threads设置（默认为128），该设置控制一次处理请求的线程数。 It would be better to throttle yourself at that 2 * that number per host. 最好是在每个主机的2 *那个数字上限制自己。 (See: How Cassandra handle blocking execute statement in datastax java driver for more detail there) （请参阅： Cassandra如何处理阻塞datastax java驱动程序中的execute语句以获取更多详细信息）

I would like a request to be considered outstanding until it has completed. 我希望在完成之前将其视为未完成的请求。 This includes retries! 这包括重试！ The situation where I am getting retryable failures from the cluster (such as timeouts waiting for consistency) is primary situation where I want to backpressure and stop consuming messages from the queue. 我从群集中获得可重试失败的情况（例如等待一致性的超时）是我想要反压并停止消耗队列消息的主要情况。

The driver will not complete the future until it has completed successfully, exhausted its retries or failed for some reason. 在成功完成，耗尽其重试或由于某种原因失败之前，驱动程序将无法完成未来。 Therefore you can tie releasing of the the semaphore permits until the future completes or fails. 因此，您可以绑定释放信号量许可证，直到将来完成或失败。

获得Cassandra Writes背压的最佳方法是什么？

问题描述

2 个解决方案

解决方案1
4 2016-02-10 22:38:21

解决方案2
2 2016-02-10 22:45:02

获得Cassandra Writes背压的最佳方法是什么？

问题描述

2 个解决方案

解决方案1 4 2016-02-10 22:38:21

解决方案2 2 2016-02-10 22:45:02

解决方案1
4 2016-02-10 22:38:21

解决方案2
2 2016-02-10 22:45:02