为什么双节点集群的一个节点宕机时，CQL Batch 语句会失败？

Question

I have created a Cassandra cluster with 2 nodes and keyspaces with a replication factor of 2:我创建了一个 Cassandra 集群，其中包含 2 个节点和密钥空间，复制因子为 2：

CREATE KEYSPACE data WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 2 };

Everything works fine when both nodes are up.当两个节点都启动时，一切正常。 But whenever I take down one of the nodes, I receive the following error from my Java client:但是每当我关闭其中一个节点时，我都会从我的 Java 客户端收到以下错误：

com.datastax.driver.core.exceptions.UnavailableException: Not enough replica available for query at consistency ONE (1 required but only 0 alive)

Since one node is still up and running and the key space is replicated, I was expecting the queries to succeed.由于一个节点仍在运行并且密钥空间被复制，我期待查询成功。 In fact, I'm able to login into the cqlsh on the running node and manually set "consistency one" on the CLI, and execute queries successfully from cqlsh.事实上，我可以在运行节点上登录 cqlsh 并在 CLI 上手动设置“一致性一”，并从 cqlsh 成功执行查询。

But from my Java client, the queries all fail, even though one node is still healthy.但是从我的 Java 客户端，查询全部失败，即使一个节点仍然健康。 But if I use nodetool to remove the down node manually ( nodetool removenode ), the Java client then works fine.但是，如果我使用 nodetool 手动删除下节点（ nodetool removenode ），Java 客户端就可以正常工作。 i'm using the DataStax Java driver.我正在使用 DataStax Java 驱动程序。

Here is a test Java code:下面是一个测试Java代码：

public class CassandraTest {

    public static void main(String[] args) {

            Cluster cluster;
            Session session;
            ResultSet results;
            Row rows;

            // Connect to the cluster and keyspace "demo"
            cluster = Cluster
                            .builder()
                            .addContactPoint("172.31.2.11")
                        //  .withRetryPolicy(DefaultRetryPolicy.INSTANCE)
                        //  .withLoadBalancingPolicy(
                        //                  new TokenAwarePolicy(new DCAwareRoundRobinPolicy()))
                         .build();
                session = cluster.connect("user_data");

                // Insert one record into the users table
                PreparedStatement statement = session.prepare(

                "INSERT INTO user_profile" + "(last_name, user_id, user_roles, email, first_name)"
                                + "VALUES (?,?,?,?,?);");

                BatchStatement batch  = new BatchStatement();
                batch.add (statement.bind("Jones", "22321", "Test Role",
                                "bob@example.com", "Bob"));
                batch.add (statement.bind("Jones2", "222321", "2Test Role",
                                "2bob@example.com", "2Bob"));

                session.execute (batch);
      }      
}

It looks like there are issue with BatchStatements from the DataStax Java driver when one of the Cassandra nodes fail.当 Cassandra 节点之一发生故障时，DataStax Java 驱动程序中的 BatchStatements 似乎存在问题。 If I change the code to use BoundStatement, instead of BatchStatement, the Java code works.如果我将代码更改为使用 BoundStatement 而不是 BatchStatement，则 Java 代码可以工作。

Any suggested workarounds to get BatchStatements to work correctly when one of the nodes are down?当其中一个节点关闭时，是否有任何建议的解决方法可以让 BatchStatements 正常工作？

Answer 1

For those coming across this post, the mutations (write statements) in a CQL batch need to be persisted to the system.batchlog of TWO nodes to ensure that if a batch fails to get written on a replica, the batch in the batchlog can be replayed to the replica that failed.对于那些遇到这篇文章的人，需要将 CQL 批处理中的突变（写入语句）持久化到两个节点的system.batchlog中，以确保如果批处理无法写入副本，批处理日志中的batchlog可以重播到失败的副本。

This behaviour is the LOGGED part of the LOGGED BATCH .此行为是LOGGED BATCH的LOGGED部分。 It is a fail-safe that ensures either (a) ALL of the batch statements are applied, or (b) NONE of the batch succeeds because (c) there is no rollback mechanism for failed batches.它是一种故障安全机制，可确保 (a) 应用所有批处理语句，或 (b) 没有一个批处理成功，因为 (c) 失败的批处理没有回滚机制。

In this scenario where there are only two nodes but one node is down, the coordinator cannot persist the batch on two nodes so the batch is marked as failed -- NONE of the statements in the batch are even attempted so there is no rollback required.在这种只有两个节点但一个节点关闭的场景中，协调器无法将批处理保存在两个节点上，因此该批处理被标记为失败——甚至尝试了批处理中的任何语句，因此不需要回滚。 Cheers!干杯!

为什么双节点集群的一个节点宕机时，CQL Batch 语句会失败？

问题描述

1 个解决方案

解决方案1
0 2022-09-09 02:08:51

为什么双节点集群的一个节点宕机时，CQL Batch 语句会失败？

问题描述

1 个解决方案

解决方案1 0 2022-09-09 02:08:51

解决方案1
0 2022-09-09 02:08:51