Which node will respond to "SELECT * FROM system.local" using the Cassandra Java driver?

Question

I am trying to write some synchronization code for a java app that runs on each of the cassandra servers in our cluster (so each server has 1 cassandra instance + our app). For this I wanted to make a method that will return the 'local' cassandra node, using the java driver.

Every process creates a cqlSession using the local address as contactPoint. The driver will figure out the rest of the cluster from that. But my assumption was that the local address would be its 'primary' node, at least for requesting things from the system.local table. This seems not so, when trying to run the code.

Is there a way in the Java driver to determine which of the x nodes the process its running on?

I tried this code:

public static Node getLocalNode(CqlSession cqlSession) {
  Metadata metadata = cqlSession.getMetadata();
  Map<UUID, Node> allNodes = metadata.getNodes();

  Row row = cqlSession.execute("SELECT host_id FROM system.local").one();
  UUID localUUID = row.getUuid("host_id");

  Node localNode = null;
  for (Node node : allNodes.values()) {
    if (node.getHostId().equals(localUUID)) {
      localNode = node;
      break;
    }
  }
  return localNode;
}

But it seems to return random nodes - which makes sense if it just sends the query to one of the nodes in the cluster. I was hoping to find a way without providing hardcoded configuration to determine what node the app is running on.

Answer 1

my assumption was that the local address would be its 'primary' node, at least for requesting things from the system.local table. This seems not so, when trying to run the code.

Correct. When running a query where token range ownership cannot be determined, a coordinator is "selected." There is a random component to that selection. But it does take things like.network distance and resource utilization into account.

I'm going to advise reading the driver documentation on Load Balancing . This does a great job of explaining how the load balancing policies work with the newer drivers (>= 4.10).

In that doc you will find that query routing plans:

are different for each query, in order to balance the load across the cluster;

only contain nodes that are known to be able to process queries, ie neither ignored nor down;

favor local nodes over remote ones.

As far as being able to tell which apps are connected to which nodes, try using the execution information returned by the result set. You should be able to get the coordinator's endpoint and hostId that way.

ResultSet rs = session.execute("select host_id from system.local");
Row row = rs.one();
System.out.println(row.getUuid("host_id"));
System.out.println();
System.out.println(rs.getExecutionInfo().getCoordinator());

Output:

9788de64-08ee-4ab6-86a6-fdf387a9e4a2

Node(endPoint=/127.0.0.1:9042, hostId=9788de64-08ee-4ab6-86a6-fdf387a9e4a2, hashCode=2625653a)

Answer 2

You are correct. The Java driver connects to random nodes by design.

The Cassandra drivers (including the Java driver) are configured with a load-balancing policy (LBP) which determine which nodes the driver contacts and in which order when it runs a query against the cluster.

In your case, you didn't configure a load-balancing policy so it defaults to the DefaultLoadBalancingPolicy . The default policy calculates a query plan (list of nodes to contact) for every single query so each plan is different across queries.

The default policy gets a list of available nodes (down or unresponsive nodes are not included in the query plan) that will "prioritise" query replicas (replicas which own the data) in the local DC over non-replicas meaning replicas will be contacted as coordinators before other nodes. If there are 2 or more replicas available, they are ordered based on "healthiest" first. Also, the list in the query plan are shuffled around for randomness so the driver avoids contacting the same node(s) all the time.

Hopefully this clarifies why your app doesn't always hit the "local" node. For more details on how it works, see Load balancing with the Java driver .

I gather from your post that you want to circumvent the built-in load-balancing behaviour of the driver. It seems like you have a very edge case that I haven't come across and I'm not sure what outcome you're after. If you tell us what problem you are trying to solve, we might be able to provide a better answer. Cheers!

Which node will respond to "SELECT * FROM system.local" using the Cassandra Java driver?

Question

2 answers

solution1
2 2022-11-21 19:46:56

solution2
1 2022-11-22 01:37:04

Which node will respond to "SELECT * FROM system.local" using the Cassandra Java driver?

Question

2 answers

solution1 2 2022-11-21 19:46:56

solution2 1 2022-11-22 01:37:04

solution1
2 2022-11-21 19:46:56

solution2
1 2022-11-22 01:37:04