简体   繁体   English

有关将关系模式转换为Cassandra的建议

[英]Advice on Converting a Relational Schema to Cassandra

I am hoping to get some suggestions on how to best approach converting a typical relational schema to Cassandra. 我希望就如何最好的方法将典型的关系模式转换为Cassandra提出一些建议。 The Relational Schema is: 关系架构为:

CREATE TABLE IF NOT EXISTS sales (
   sale_id     bigint(20) UNSIGNED NOT NULL
                          AUTO_INCREMENT,
   create_time timestamp  NOT NULL
                          DEFAULT ‘0000-00-00 00:00:00’,
   account     bigint(20) UNSIGNED NOT NULL DEFAULT ‘0’,
   store       char(25)   NOT NULL DEFAULT ‘’,
   product     char(25)   NOT NULL DEFAULT ‘’,
   coupon      char(18)   NOT NULL DEFAULT ‘’,
   amount      decimal(8,2) NOT NULL,
   PRIMARY KEY (sale_id),
   KEY         create_time (create_time) )

The Cassandra schema I've come up with is: 我提出的Cassandra模式是:

CREATE TABLE sales (
            sale_id     uuid,
            create_time timestamp,
            account     text,
            store       int,
            coupon      text,
            product     text,
            amount      int,
            PRIMARY KEY ((create_time, store), coupon))

(with indexes created on non-key columns I need to query) (在我需要查询的非关键列上创建索引)

Typical query is to get all sales by product by product/coupon/account/store over some time period. 典型的查询是在一段时间内按产品/优惠券/帐户/商店获取按产品列出的所有销售额。

Does this make sense? 这有意义吗?

Any suggestions on how this may be improved for reasonable read/write performance? 关于如何改善此性能以获得合理的读/写性能的任何建议?

Thanks in advance for any suggestions. 在此先感谢您的任何建议。

No, You want to model your Cassandra schema to answer each question to get the good performance. 不,您想要对Cassandra模式进行建模以回答每个问题,以获得良好的性能。 Let's say you want to find all (recent) sales by product with you want create your primary key as ( productID, created_time ) 假设您想按产品查找所有(最近)销售,并希望将主键创建为( productID, created_time

If your application normally wants to search for products that are sold recently, then you want to order the cluster factor( created_time in your example) as desc. 如果您的应用程序通常想要搜索最近出售的产品,那么您想按聚类(desc)的顺序来排序聚类因子(在示例中为created_time )。

Likewise you might duplicating your sales data in multiple column families. 同样,您可以在多个列族中复制销售数据。 Don't be scared to duplicate data while modeling in distributed environment. 在分布式环境中建模时,不要害怕重复数据。 You want to de-normalize and look forward to get your results from partition itself. 您想de-normalize并期待从分区本身获取结果。

Hope this helps. 希望这可以帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM