简体   繁体   中英

Cassandra - Exactly one wide row per node for a given ColumnFamily?

Subject says it all. I want to be able to randomly distribute a large set of records but keep them clustered in one wide row per node.

As an example, lets say I've got a collection of about 1 million records each with a unique id. If I just go ahead and set the primary key (and therefore the partition key) as the unique id, I'll get very good random distribution across my server cluster. However, each record will be its own row. I'd like to have each record belong to one large wide row (per server node) so I can have them sorted or clustered on some other column.

If I say have 5 nodes in my cluster, I could randomly assign a value of 1 - 5 at the time of creation and have the partition key set to this value. But this becomes troublesome if I add or remove nodes. What effectively I want is to partition on the unique id of the record modulus N (id % N; where N is the number of nodes).

I have to imagine there's a mechanism in Cassandra to essentially randomize the partitioning without even using a key (and then clustering on some column).

Thanks for any help.

You really do not want to do what you are saying you want to do.

First of all, there really is no good mechanism to ensure even distribution of one row per node within Cassandra. You could easily do it once by calculating the tokens so they would be distributed among your nodes initially, but if you ever did change the cluster topology (eg add or remove nodes or datacenters) then you would need to manually recalculate and move data around. All of this is exactly what Cassandra is designed to do for you.

Instead of going with your strict goal of one row per node, compromise a bit, and go with around 100-1000 totals rows. Use the last 2 or 3 digits (as a convenience, you can use anything else as well) as the shard id, and create a table like so:

 create table test (shard_id int, id int, value text, primary key (shard_id,id));
 insert into test (shard_id, id, value) values(72,193727872, 'value1');
 insert into test (shard_id, id, value) values(73,193727873, 'value2');
 insert into test (shard_id, id, value) values(73,7234243873, 'value3');
 insert into test (shard_id, id, value) values(73,193727874, 'value4');

 select * from test where shard_id = 73;

  shard_id | id        | value
 ----------+-----------+--------
        73 | 193727873 | value2
        73 | 193727874 | value4
        73 | 723423873 | value3

So you achieve even distribution of your data across the cluster because of the shard_id, and by quickly enumerating through the shard_ids, you can retrieve all of the values. Each read is wide enough (with a million+ total cells) that you take advantage of linear disk reads, and there are few enough random seeks.

You can also perform any of the other operations (gt/lt comparisons). You just have to do a little bit of extra work in your code to make the read use the correct shard id, and continue on to the next shard if necessary.

Slight increase in complexity.

Very small decrease in linear read performance.

Very good operational runtime characteristics.

You could try using a compound primary key, eg,

create table wideRow(key varchar, value timeuuid, primary key (key,value));

Because you are using a compound primary key the partitioning will be done on the key / value combination rather than just on key and will distribute the wide row across your nodes. Your wide row will be split into one wide row per node.

Partitioning on id-modulus-cluster-size has exactly the same problems as you add and remove nodes that you mentioned earlier. That is why what Cassandra does is called consistent hashing: only [at most] 1/N rows need to be moved when adding a new node to a cluster of size N, as opposed to nearly all of them with your approach.

More: http://en.wikipedia.org/wiki/Consistent_hashing

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM