Only one node owns data in a Cassandra cluster

Question

I am new to Cassandra and just run a cassandra cluster (version 1.2.8) with 5 nodes, and I have created several keyspaces and tables on there. However, I found all data are stored in one node (in the below output, I have replaced ip addresses by node numbers manually):

Datacenter: 105
==========
Address         Rack        Status State   Load            Owns                Token
                                                                               4
node-1          155         Up     Normal  249.89 KB       100.00%             0
node-2          155         Up     Normal  265.39 KB       0.00%               1
node-3          155         Up     Normal  262.31 KB       0.00%               2
node-4          155         Up     Normal  98.35 KB        0.00%               3
node-5          155         Up     Normal  113.58 KB       0.00%               4

and in their cassandra.yaml files, I use all default settings except cluster_name , initial_token , endpoint_snitch , listen_address , rpc_address , seeds , and internode_compression . Below I list those non-ip address fields I modified:

endpoint_snitch: RackInferringSnitch
rpc_address: 0.0.0.0
seed_provider:
    - class_name: org.apache.cassandra.locator.SimpleSeedProvider
      parameters:
          - seeds: "node-1, node-2"
internode_compression: none

and all nodes using the same seeds.

Can I know where I might do wrong in the config? And please feel free to let me know if any additional information is needed to figure out the problem.

Thank you!

Answer 1

Your token assignment is the problem here. An assigned token are used determines the node's position in the ring and the range of data it stores. When you generate tokens the aim is to use up the entire range from 0 to (2^127 - 1). Tokens aren't id's like with mysql cluster where you have to increment them sequentially .

There is a tool on git that can help you calculate the tokens based on the size of your cluster.

Read this article to gain a deeper understanding of the tokens. And if you want to understand the meaning of the numbers that are generated check this article out .

Answer 2

If you are starting with Cassandra 1.2.8 you should try using the vnodes feature. Instead of setting the initial_token , uncomment # num_tokens: 256 in the cassandra.yaml, and leave initial_token blank, or comment it out. Then you don't have to calculate token positions. Each node will randomly assign itself 256 tokens, and your cluster will be mostly balanced (within a few %). Using vnodes will also mean that you don't have to "rebalance" you cluster every time you add or remove nodes.

See this blog post for a full description of vnodes and how they work:
http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

Answer 3

You should provide a replication_factor when creating a keyspace:

CREATE KEYSPACE demodb WITH REPLICATION = {'class' : 'SimpleStrategy', 'replication_factor': 3};

If you use DESCRIBE KEYSPACE x in cqlsh you'll see what replication_factor is currently set for your keyspace (I assume the answer is 1).

More details here

Only one node owns data in a Cassandra cluster

Question

3 answers

solution1
2 2013-08-10 12:30:34

solution2
2 ACCPTED 2013-08-10 16:02:50

solution3
-1 2013-08-10 08:20:04

Only one node owns data in a Cassandra cluster

Question

3 answers

solution1 2 2013-08-10 12:30:34

solution2 2 ACCPTED 2013-08-10 16:02:50

solution3 -1 2013-08-10 08:20:04

solution1
2 2013-08-10 12:30:34

solution2
2 ACCPTED 2013-08-10 16:02:50

solution3
-1 2013-08-10 08:20:04