简体   繁体   English

比较 Cassandra 结构与关系数据库

[英]Comparing Cassandra structure with Relational Databases

A few days ago I read about wide-column stored type of NoSQL and exclusively Apache-Cassandra.几天前,我读到了 NoSQL 的宽列存储类型和 Apache-Cassandra。

What I understand is that Cassandra consist of:我的理解是 Cassandra 包括:

A keyspace(like database in relational databases) and supporting many column families or tables (Same as table in relational databases) and unlimited rows.一个键空间(如关系数据库中的数据库)并支持许多列族或表(与关系数据库中的表相同)和无限行。

From Stackoverflow tags:从 Stackoverflow 标签:

A wide column store is a type of key-value database.宽列存储是一种键值数据库。 It uses tables, rows, and columns, but unlike a relational database, the names and format of the columns can vary from row to row in the same table.它使用表、行和列,但与关系数据库不同,列的名称和格式在同一表中的行与行之间可能不同。

In Cassandra all of the rows (in a table) should have a row key then each row key can have multiple columns.在 Cassandra 中,所有行(在表中)都应该有一个行键,然后每个行键可以有多个列。 I read about differences in implementation and storing data of Relational database and NoSQL (Cassandra).我阅读了关系数据库和 NoSQL (Cassandra) 在实现和存储数据方面的差异。

But I don't understand the difference between structure:但我不明白结构之间的区别:

Imagine a scenario which I have a table (or column family in Cassandra):想象一下我有一个表(或 Cassandra 中的列族)的场景:

When I execute a query (CQL) like this :当我执行这样的查询(CQL)时:

select * from users;

It gives me the result as you can see :它给了我你可以看到的结果:

lastname  | age  | city          | email               
----------+------+---------------+----------------------
      Doe |   36 | Beverly Hills | janedoe@email.com       
    Jones |   35 |        Austin | bob@example.com        
    Byrne |   24 |     San Diego | robbyrne@email.com         
    Smith |   46 |    Sacramento | null                    
   Jones2 | null |        Austin | bob@example.com       

So I perform the above scenario in relational database (MS SQL) with the following query:因此,我使用以下查询在关系数据库 (MS SQL) 中执行上述场景:

select * from [users] 

And the result is:结果是:

lastname  | age  | city          | email               
----------+------+---------------+----------------------
      Doe |   36 | Beverly Hills | janedoe@email.com       
    Jones |   35 |        Austin | bob@example.com        
    Byrne |   24 |     San Diego | robbyrne@email.com         
    Smith |   46 |    Sacramento | NULL                    
   Jones2 | NULL |        Austin | bob@example.com       

I know that Cassandra supports dynamic column and I can perform this by using sth like:我知道 Cassandra 支持动态列,我可以通过使用 sth 来执行此操作:

ALTER TABLE users ADD website varchar;

But it is available in relational model for example in mssql the above code can be implemented too.但是它在关系模型中可用,例如在 mssql 中,上面的代码也可以实现。 Something like:类似的东西:

ALTER TABLE users ADD website varchar(MAX);

What I see is that the first select and second select result is the same.我看到的是第一次选择和第二次选择的结果是一样的。 In Cassandra , they just give a row key (lastname) as a standalone object but it is same as a unique field (like ID or a text) in mssql (and all relational databases) and I see the type of column in Cassandra is static (in my example varchar ) unlike what it describes in Stackoverflow tag.在 Cassandra 中,他们只提供一个行键(姓氏)作为独立对象,但它与 mssql(和所有关系数据库)中的唯一字段(如 ID 或文本)相同,我看到 Cassandra 中的列类型是静态的(在我的示例中varchar )与 Stackoverflow 标签中描述的不同。

So my questions is:所以我的问题是:

  1. Is there any misunderstanding in my imagination about Cassandra?!我对Cassandra的想象有什么误解吗?!

  2. So what is different between two structure ?!那么两种结构有什么不同?! I show you the result is same.我告诉你结果是一样的。

  3. Is there any special scenarios (JSON like) that cannot be implemented in relational databases but Cassandra supports?是否有任何特殊场景(JSON 之类)无法在关系数据库中实现但 Cassandra 支持? (For example I know that nested column doesn't support in Cassandra.) (例如,我知道 Cassandra 不支持嵌套列。)

Thank you for reading.感谢您的阅读。

We have to look at more complex example to see the differences :)我们必须查看更复杂的示例才能看到差异:)

For start:开始:

  • column family term was used in older Thrift API列族术语在较旧的 Thrift API 中使用
  • in newer CQL API, the term table is used在较新的 CQL API 中,使用术语表

Table is defined as "two-dimensional view of a multi-dimensional column family".表被定义为“多维列族的二维视图”。

The term "wide-rows" was related mainly to the Thrift API. “宽行”一词主要与 Thrift API 相关。 In cql it is defined a bit differently, but underneath looks the same.在 cql 中它的定义有点不同,但在下面看起来是一样的。

Comparing SQL and CQL.比较 SQL 和 CQL。 In SQL table is a set of rows.在 SQL 表中是一组行。 In simple example it looks like in CQL it is the same, but it is not.在简单的例子中,它在 CQL 中看起来是一样的,但事实并非如此。 CQL table is a set of partitions, where each partition can be just a single row (eg when you don't have a clustering key) or multiple rows. CQL 表是一组分区,其中每个分区可以是单行(例如,当您没有集群键时)或多行。 Partition containing multiple rows is in Thrift therminology named "wide-row".包含多行的分区在 Thrift 热学中名为“宽行”。 To see how it is stored underneath, please read eg part about composite-keys from here .要查看它是如何存储在下面的,请从这里阅读例如关于复合键的部分。

There are more differences:还有更多区别:

  • CQL can have static columns which are stored on partition level - it seems that every row in partition have a common value, but really it is a single value stored on upper level. CQL 可以具有存储在分区级别的静态列 - 分区中的每一行似乎都有一个共同的值,但实际上它是存储在上层的单个值。 It can be used also to model 1:N relations它还可以用于建模 1:N 关系
  • In CQL you can have collection type columns - set, list, map在 CQL 中,您可以拥有集合类型列 - set、list、map
  • Column can contain a user defined type (you can define eg address as type, and reuse this type in many places), or collection can be a collection of user defined types列可以包含用户定义的类型(您可以将例如address定义为类型,并在许多地方重用该类型),或者集合可以是用户定义类型的集合
  • But also CQL does not support JOINs which are available in SQL, and you have to structure your tables very carefully, since they have to be strictly query oriented (in cassandra you can't query data by any column value, secondary indexes also have many limitations).但是 CQL 也不支持 SQL 中可用的 JOIN,并且您必须非常仔细地构造您的表,因为它们必须严格面向查询(在 cassandra 中您不能通过任何列值查询数据,二级索引也有很多限制)。 It is usually said that in relational model you model tables clearly basing on data, when in cassandra you model basing on queries.人们常说,在关系模型中,您可以清楚地根据数据对表进行建模,而在 cassandra 中,您可以根据查询进行建模。

I hope I was able to make it a bit more clear for you.我希望我能让你更清楚一点。 I recommend watching some vidoes (or reading slides) from Datastax Core Concepts Course as solid introduction to Cassandra.我建议观看Datastax 核心概念课程中的一些视频(或阅读幻灯片),作为对 Cassandra 的可靠介绍。

In my experience CQL misleads a lot of people.根据我的经验,CQL 误导了很多人。 First of all you would never want to do:首先,你永远不会想做:

SELECT * FROM a_table_here; 

On a production Cassandra cluster, since you are putting a huge load on your Coordinator node to aggregate all of the data from all of the other nodes.在生产 Cassandra 集群上,因为您在 Coordinator 节点上放置了大量负载以聚合来自所有其他节点的所有数据。 Also by default, you will be given back a maximum of 10000 "rows".此外,默认情况下,您最多会收到 10000 个“行”。

To understand how Cassandra stores your data, we need to establish a few terms first:要了解 Cassandra 如何存储您的数据,我们需要先确定几个术语:

There's the Primary Key, in your case lastname , this is hashed to determine which node in the cluster owns this range, and it's stored there (plus any replica nodes).有主键,在你的情况下是lastname ,它被散列以确定集群中的哪个节点拥有这个范围,并将它存储在那里(加上任何副本节点)。

Next there's Cluster Columns, I don't know if you have any in your example, but you define them like PRIMARY KEY ((lastname),age, city) .接下来是集群列,我不知道您的示例中是否有任何列,但是您可以像PRIMARY KEY ((lastname),age, city)一样定义它们。 In that example you are clustering by age first then city, this is ORDERED.在该示例中,您首先按年龄聚类,然后按城市聚类,这是有序的。

Now for a simplistic high-level view of Cassandra for your use case, it stores the data as a Map to an ordered Multimap:现在为您的用例提供一个简单的 Cassandra 高级视图,它将数据作为 Map 存储到有序 Multimap:

Doe -> 36:Beverly Hills -> janedoe@email.com

Where 'Doe' is the Primary Key, which tells you which node(s) have that row of data.其中“Doe”是主键,它告诉您哪些节点具有该行数据。 And 36:Beverly Hills is the Ordered Clustering Keys (part of the ordered multimap key).36:Beverly Hills是 Ordered Clustering Keys(有序多映射键的一部分)。 Lastly janedoe@email.com is the final value (can be multiple mind you) for the Map to a Multimap.最后 janedoe@email.com 是 Map to a Multimap 的最终值(可能是多个人)。

There's a lot of nuisances that I left out to make the example simple, for a more in-depth I would highly suggest reading: http://www.planetcassandra.org/making-the-change-from-thrift-to-cql/为了使示例简单,我遗漏了很多麻烦,为了更深入,我强烈建议阅读: http : //www.planetcassandra.org/making-the-change-from-thrift-to-cql /

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM