简体   繁体   中英

How to store data in cassandra to query all records sorted by one column?

I have a table which stores users , I want to query all users sorted by their score,what is the most efficient way to achieve this?

Note:I am considering on performance too.

If Cassandra can't do this, Can I use something like Apache Solr to do this with the integration of cassandra?

Within a partition Cassandra stores data in sorted order, so you can create a table like this:

CREATE TABLE sorted_users (user_type INT, user_id UUID, score INT,
    PRIMARY KEY (user_type, score, user_id)) WITH CLUSTERING ORDER BY (score DESC);

When you insert users into the table, set user_type to be 1 so that all the users are put into the same partition. The score column is then a clustering column, so rows will be sorted by it in descending order. Then you can efficiently read out the users in sorted order or do range queries based on the score column. A partition can hold up to a maximum of 2 billion rows.

You might have another table with all the user details where user_id is the primary key, and just use this one when you want to query based on score.

To get the top 10 users, you would do:

SELECT user_id, score FROM sorted_users LIMIT 10;

To update a user's score, you'd need to delete the old score and insert the new score since you can't directly update a primary key field.

Most likely:

  1. you will have a PRIMARY KEY (user_id) ( user_id might be specific for your domain/application)

  2. where user_id will be a partition key (the node where partition is stored will be calculated by Cassandra hash function (Murmur3) on partition key value)

3.1. you could have score as a clustering column (the column on which the data inside the partition will be sorted), but since you can't have same id for multiple user, it doesn't make much sense

3.2. so you can't ask for all users sorted by score, as users are distributed among nodes in Cassandra

3.3. if you run select * from users order by score; you will get back Bad Request: ORDER BY is only supported when the partition key is restricted by an EQ or an IN. (which proves 3.2.)

3.4. of course, you still can do select * from users , but then you need to sort manually in your application

Regards, Solr, I can't say for sure, but for what I know, Spark is usually used for this purporse (as it gives your more query capabilities, by keeping the data in memory as far as it can), there is an official https://github.com/datastax/spark-cassandra-connector from datastax you can look into.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM