简体   繁体   中英

Cassandra pagination and token function; selecting a partition key

I've been doing a lot of reading lately on Cassandra data modelling and best practices.

What escapes me is what the best practice is for choosing a partition key if I want an application to page through results via the token function.

My current problem is that I want to display 100 results per page in my application and be able to move on to the next 100 after.

From this post: https://stackoverflow.com/a/24953331/1224608 I was under the impression a partition key should be selected such that data spreads evenly across each node. That is, a partition key does not necessarily need to be unique.

However, if I'm using the token function to page through results, eg:

SELECT * FROM table WHERE token(partitionKey) > token('someKey') LIMIT 100;

That would mean that the number of results returned from my partition may not necessarily match the number of results I show on my page, since multiple rows may have the same token(partitionKey) value. Or worse, if the number of rows that share the partition key exceeds 100, I will miss results.

The only way I could guarantee 100 results on every page (barring the last page) is if I were to make the partition key unique. I could then read the last value in my page and retrieve the next query with an almost identical query:

SELECT * FROM table WHERE token(partitionKey) > token('lastKeyOfCurrentPage') LIMIT 100;

But I'm not certain if it's good practice to have a unique partition key for a complex table.

Any help is greatly appreciated!

But I'm not certain if it's good practice to have a unique partition key for a complex table.

It depends on requirement and Data Model how you should choose your partition key. If you have one key as partition key it has to be unique otherwise data will be upsert (overridden with new data). If you have wide row (a clustering key), then make your partition key unique (a key that appears once in a table) will not serve the purpose of wide row. In CQL “wide rows” just means that there can be more than one row per partition. But here there will be one row per partition. It would be better if you can provide the schema.

Please follow below link about pagination of Cassandra.

You do not need to use tokens if you are using Cassandra 2.0+. Cassandra 2.0 has auto paging. Instead of using token function to create paging, it is now a built-in feature.

Results pagination in Cassandra (CQL)

https://www.datastax.com/dev/blog/client-side-improvements-in-cassandra-2-0 https://docs.datastax.com/en/developer/java-driver/2.1/manual/paging/

Saving and reusing the paging state

You can use pagingState object that represents where you are in the result set when the last page was fetched.

EDITED :

Please check the below link:

Paging Resultsets in Cassandra with compound primary keys - Missing out on rows

I recently did a POC for a similar problem. Maybe adding this here quickly.

First there is a table with two fields. Just for illustration we use only few fields.

1.Say we insert a million rows with this

Along comes the product owner with a (rather strange) requirement that we need to list all the data as pages in the GUI. Assuming that there are hundred entries 10 pages each.

  1. For this we update the table with a column called page_no.
  2. Create a secondary index for this column.
  3. Then do a one time update for this column with page numbers. Page number 10 will mean 10 contiguous rows updated with page_no as value 10.
  4. Since we can query on a secondary index each page can be queried independently.

Code is self explanatory and here - https://github.com/alexcpn/testgo

Note caution on how to use secondary index properly abound. Please check it. In this use case I am hoping that i am using it properly. Have not tested with multiple clusters.

"In practice, this means indexing is most useful for returning tens, maybe hundreds of results. Bear this in mind when you next consider using a secondary index." From http://www.wentnet.com/blog/?p=77

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM