简体   繁体   中英

Astyanax Composite Keys in Cassandra

Im trying to create a schema that will enable me access rows with only part of the row_key. For example the key is of the form user_id:machine_os:machine_arch

An example of a row key: 12242:"windows2000":"x86"

From the documentation I could not understand whether this will enable me to query all rows that have userid=12242 or query all rows that have "windows2000"

Is there any feasible way to achieve this ?

Thanks,

Yadid

Alright, here is what is happening: based on your schema, you are effectively creating a column family with a composite primary key or a composite rowkey . What this means is, you will need to restrict each component of the composite key except the last one with a strict equality relation . The last component of the composite key can use inequality and the IN relation, but not the 1st and 2nd components.

Additionally, you must specify all three parts if you want to utilize any kind of filtering. This is necessary because without all parts of the partition key, the coordinator node will have no idea on which node in the cluster the data exists (remember, Cassandra uses the partition key to determine replicas and data placement).

Effectively, this means you can't do any of these:

select * from datacf where user_id = 100012; # missing 2nd and 3rd key components
select * from datacf where user_id = 100012; and machine_arch = 'x86'; # missing 3rd key component
select * from datacf where machine_arch = 'x86'; # you have to specify the 1st
select * from datacf where user_id = 100012 and machine_arch in ('x86', 'x64'); # nope, still want 3rd

However, you will be able to run queries like this:

select * from datacf where user_id = 100012 and machine_arch = 'x86'
   and machine_os = "windows2000"; # yes! all 3 parts are there

select * from datacf where user_id = 100012 and machine_os = "windows2000"
   and machine_arch in ('x86', 'x64'); # the last part of the key can use the 'IN' or other equality relations

To answer your initial question, with you existing data model, you will neither be able to query data with userid = 12242 or query all rows that have "windows2000" as the machine_os .

If you can tell me exactly what kind of query you will be running, I can probably help in trying to design the table accordingly. Cassandra data models usually work better when looked at from the data retrieval perspective. Long story short- use only user_id as your primary key and use secondary indexes on other columns you want to query on.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM