简体   繁体   中英

Cassandra, how to filter and update a big table dynamically?

I'm trying to find the best data model to adapt a very big mysql table in Cassandra. This table is structured like this:

CREATE TABLE big_table (
  social_id, 
  remote_id,
  timestamp,
  visibility,
  type,
  title,
  description,
  other_field,
  other_field,
  ...
  )

A page (which is not here) can contain many socials, which can contain many remote_ids.

Social_id is the partitioning key, remote_id and timestamp are the clustering key: "Remote_id" gives unicity, "Time" is used to order the results. So far so good.

The problem is that users can also search on their page contents, filtering by one or more socials, one or more types, visibility (could be 0,1,2), a range of dates or even nothing at all. Plus, based on the filters, users should be able to set visibility.

I tried to handle this case, but I really can find a sustainable solution. The best I've got is to create another table, which I need to keep up with the original one. This table will have:

  • page_id: partition key
  • timestamp, social_id, type, remote_id: clustering key

Plus, create a Materialized View for each combination of filters, which is madness.

Can I avoid creating the second table? What wuold be the best Cassandra model in this case? Should I consider switching to other technologies?

I start from last questions.

> What would be the best Cassandra model in this case?

As stated in Cassandra: The Definitive Guide, 2nd edition (which I highly recommend to read before choosing or using Cassandra),

In Cassandra you don't start with the data model; you start with the query model .

You may want to read an available chapter about data design at Safaribooksonline.com . Basically, Cassandra wants you to think about queries only and don't care about normalization.

So the answer on

> Can I avoid creating the second table?

is You shouldn't avoiding it .

> Should I consider switching to other technologies?

That depends on what you need in terms of replication and partitioning. You may end up creating master-master synchronization based on RDBMS or something else. In Cassandra, you'll end up with duplicated data between tables and that's perfectly normal for it. You trade disk space in exchange for reading/writing speed.

> how to filter and update a big table dynamically?

If after all of the above you still want to use normalized data model in Cassandra, I suggest you look on secondary indexes at first and then move on to custom indexes like Lucene index .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM