对于我的方案，Cassandra（CQL）模式/表与RDBMS相同

Question

I have looked at the Twissandra examples. 我看过特维桑德拉的例子。 I asked a similar question regarding this a few days back and received some tips I implemented here. 几天前，我对此也提出了类似的问题，并收到了一些我在此处实施的技巧。 However, by looking at the tables (column families) I see barely any difference between this and a relational database. 但是，通过查看表（列族），我几乎看不到该数据库与关系数据库之间的任何区别。

My scenario: A simple address book where a user can create his own contacts and group them (one contact can be placed in many groups, one group can contain many contacts). 我的场景：一个简单的通讯录，用户可以在其中创建自己的联系人并将其分组（一个联系人可以分为多个组，一组可以包含多个联系人）。 A contact may have multiple addresses for example. 一个联系人可能有多个地址。

I want to retrieve all the contacts who live in address x and are placed in group y. 我想检索居住在地址x中并位于y组中的所有联系人。 Therefore, I did the following: 因此，我做了以下工作：

CREATE TABLE if not exists User (user_id uuid, contact_id uuid, type varchar, email varchar, PRIMARY KEY(id));
CREATE TABLE if not exists Contact (contact_id uuid, firstname varchar,lastname varchar, photo blob, imagelength int, note varchar, PRIMARY KEY (id));
CREATE TABLE if not exists Address (address_id uuid, contact_id uuid, street varchar, number int, zipcode varchar, country varchar, PRIMARY KEY(address_id));
CREATE TABLE if not exists Group (group_id uuid, user_id, groupname varchar, PRIMARY KEY(group_id));
CREATE TABLE if not exists Group_Contact (group_id uuid, contact_id, PRIMARY KEY(id, contact_id));

However, based on this, this is literally exactly the same as a relational database, well, except that I believe Cassandra is putting this data in a different way than a RDBMS on disk. 但是，基于此，它实际上与关系数据库完全相同，只是我相信Cassandra会以与RDBMS磁盘不同的方式来放置此数据。 I don't see how this can be made better in Cassandra and whether I even modeled this the right way. 我看不到如何在Cassandra中更好地做到这一点，以及我是否以正确的方式建模。 It just feels as a plain relational database. 感觉就像一个普通的关系数据库。 I feel that I did something wrong since I have to use application level joins to get the address of the contacts. 我觉得我做错了，因为我必须使用应用程序级别的连接来获取联系人的地址。 I really don't know how I can de-normalize this to allow multiple addresses (and maybe even phones, emails). 我真的不知道如何将其反规范化以允许多个地址（甚至电话，电子邮件）。

Any suggestions to improve this scenario would be greatly appreciated! 任何改善这种情况的建议将不胜感激！

Answer 1

As jny indicated, data duplication, denormalization and query-based modeling are keys to building good Cassandra data models. 正如jny所指出的那样，数据复制，反规范化和基于查询的建模是构建良好的Cassandra数据模型的关键。 If I wanted to take your tables above, and build a table to support address/contact queries based-on country, I could do it like this: 如果我想把您的表格放在上面，并建立一个表格来支持基于国家/地区的地址/联系方式查询，我可以这样做：

First, I'll create a user defined type for the contact's address. 首先，我将为联系人的地址创建一个用户定义的类型。

aploetz@cqlsh:stackoverflow> CREATE TYPE contactAddress (
             ...   street varchar, 
             ...   city varchar,
             ...   zip_code varchar,
             ...   country varchar);

Next, I'll create a table called UserContactsByCountry to store user contact info, as well as any user contact addresses: 接下来，我将创建一个名为UserContactsByCountry的表来存储用户联系信息以及所有用户联系地址：

aploetz@cqlsh:stackoverflow> CREATE TABLE UserContactsByCountry (
             ...   country varchar,
             ...   user_id uuid,
             ...   type varchar,
             ...   email varchar,
             ...   firstname varchar,
             ...   lastname varchar,
             ...   photo blob,
             ...   imagelength int,
             ...   note varchar,
             ...   addresses map<text, frozen <contactAddress>>,
             ...   PRIMARY KEY ((country),user_id));

A couple of things to note here: 这里需要注意几件事：

I am using country as a partitioning key for querying, and addding user_id as a clustering key for uniqueness. 我使用country作为查询的分区键，并将user_id为唯一性的群集键。
Technically, country is being stored multiple in each row. 从技术上讲， country在每行中存储多个。 Once as the partiiton key, and again with each address. 一次作为分区密钥，再一次与每个地址有关。 Note that the country partition key is the one which allows us to run our query. 请注意， country分区键是允许我们运行查询的键。
I assume that user contacts can have multiple addresses, so I'll store them in a map of type text (varchar), contactAddress (type created above). 我假设用户联系人可以有多个地址，所以我将它们存储在文本类型（varchar），contactAddress（上面创建的类型）的映射中。

Next, I'll insert three user contacts, each with two addresses, two from the USA and one from Great Britain. 接下来，我将插入三个用户联系人，每个联系人都有两个地址，两个来自美国，一个来自英国。

aploetz@cqlsh:stackoverflow> INSERT INTO usercontactsbycountry (country, user_id, type, email, firstname, lastname, note, addresses)
VALUES ('USA',uuid(),'Tech','brycelynch@network23.com','Bryce','Lynch','Head of R&D at Network 23',{'work':{street:'101 Big Network Drive',city:'New York',zip_code:'10023',country:'USA'},'home':{street:'8192 N. 42nd St.',city:'New York',zip_code:'10025',country:'USA'}});
aploetz@cqlsh:stackoverflow> INSERT INTO usercontactsbycountry (country, user_id, type, email, firstname, lastname, note, addresses)
VALUES ('USA',uuid(),'Reporter','edisoncarter@network23.com','Edison','Carter','Reporter at Network 23',{'work':{street:'101 Big Network Drive',city:'New York',zip_code:'10023',country:'USA'},'home':{street:'76534 N. 62nd St.',city:'New York',zip_code:'10024',country:'USA'}});
aploetz@cqlsh:stackoverflow> INSERT INTO usercontactsbycountry (country, user_id, type, email, firstname, lastname, note, addresses)
VALUES ('GBR',uuid(),'Reporter','theorajones@network23.com','Theora','Jones','Controller at Network 23',{'work':{street:'101 Big Network Drive',city:'New York',zip_code:'10023',country:'USA'},'home':{street:'821 Wembley St.',city:'London',zip_code:'W11 2BQ',country:'GBR'}});

Now I can query that table for all user contacts in the USA: 现在，我可以查询该表以查找美国的所有用户联系人：

aploetz@cqlsh:stackoverflow> SELECT * FROM usercontactsbycountry WHERE country ='USA';
 country | user_id                              | addresses                                                                                                                                                                                    | email                      | firstname | imagelength | lastname | note                      | photo | type
---------+--------------------------------------+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+----------------------------+-----------+-------------+----------+---------------------------+-------+----------
     USA | 2dee94e2-4887-4988-8cf5-9aee5fd0ea1e |  {'home': {street: '8192 N. 42nd St.', city: 'New York', zip_code: '10025', country: 'USA'}, 'work': {street: '101 Big Network Drive', city: 'New York', zip_code: '10023', country: 'USA'}} |   brycelynch@network23.com |     Bryce |        null |    Lynch | Head of R&D at Network 23 |  null |     Tech
     USA | b92612dd-dbaa-42f2-8ff2-d36b6c601aeb | {'home': {street: '76534 N. 62nd St.', city: 'New York', zip_code: '10024', country: 'USA'}, 'work': {street: '101 Big Network Drive', city: 'New York', zip_code: '10023', country: 'USA'}} | edisoncarter@network23.com |    Edison |        null |   Carter |    Reporter at Network 23 |  null | Reporter

(2 rows)

There are probably other ways in which this could be modeled, but this is one that I hoped to use to help you understand some of the techniques available. 可能还有其他方法可以建模，但这是我希望用来帮助您了解一些可用技术的方法。

Answer 2

It is difficult to switch from modeling for relational databases to modeling for Cassandra, because they seem so similar: the query language looks almost the same. 从关系数据库建模切换到Cassandra建模很困难，因为它们看起来是如此相似：查询语言看起来几乎相同。 But the first rule of Cassandra is model to your queries while in Relational Databases we model to data. 但是，Cassandra的第一个规则是为您的查询建模，而在关系数据库中，我们为数据建模。 That means: 这意味着：

Consider what your query the most on 考虑一下您最查询的内容
Learn about partition keys and cluster keys 了解分区键和集群键
Don't be afraid of data duplication 不要担心数据重复

There is a good example on data modeling in Cassandra: https://www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_c.html 在Cassandra中有一个很好的数据建模示例： https : //www.datastax.com/documentation/cql/3.1/cql/ddl/ddl_music_service_c.html

对于我的方案，Cassandra（CQL）模式/表与RDBMS相同

问题描述

2 个解决方案

解决方案1
3 已采纳 2015-03-24 18:55:19

解决方案2
2 2015-03-24 14:15:09

对于我的方案，Cassandra（CQL）模式/表与RDBMS相同

问题描述

2 个解决方案

解决方案1 3 已采纳 2015-03-24 18:55:19

解决方案2 2 2015-03-24 14:15:09

解决方案1
3 已采纳 2015-03-24 18:55:19

解决方案2
2 2015-03-24 14:15:09