简体   繁体   中英

Cassandra: secondary index inside a single partition (per partition indexing)?

This question is I hope not answered in the usual "secondary index v. clustering key" questions.

Here is a simple model I have:

CREATE TABLE ks.table1 (
name text,
timestamp int,
device text,
value int,
PRIMARY KEY (md_name, timestamp, device)
)

Basically I view my data as datasets with name name , each dataset is a kind of sparse 2D matrix (rows = timestamps , columns = device ) containing value .

As the problem and the queries can be pretty symmetric (ie. is my "matrix" the best representation, or should I use the transposed "matrix") I couldn't decide easily what clustering key I should put first. It makes a bit more sense the way I did: for each timestamp I have a set of data (values for each devices present at that timestamp).

The usual query is then

select * from cycles where md_name = 'xyz';

It targets a single partition, that will be super fast, easy enough. If there's a large amount of data my users could do something like this instead:

select * from cycles where md_name = 'xyz' and timestamp < n;

However I'd like to be able to "transpose" the problem and do this:

 select * from cycles where md_name = 'xyz' and device='uvw';

That means I have to create a secondary index on device .

But (and that's where the question starts"), this index is a bit different from usual indexes, as it is used for queries inside a single partition. Create the index allows to do the same on multiple partitions:

select * from cycles where device='uvw'

Which is not necessary in my case.

  • Can I improve my model to support such queries without too much duplication?
  • Is there something like a "per-partition index"?

The index would allow you to do queries like this:

select * from cycles where md_name='xyz' and device='uvw'

But that would return all timestamps for that device in the xyz partition.

So it sounds like maybe you want two views of the data. Once based on name and time range, and one based on name, device, and time range.

If that's what you're asking, then you probably need two tables. If you're using C* 3.0, then you could use the materialized views feature to create the second view. If you're on an earlier version, then you'd have to create the two tables and do a write to each table in your application.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM