简体   繁体   English

使用宽列存储构建复合主键是否正确?

[英]Is it a correct pattern to build composite primary key using wide columns stores?

HBase and Cassandra are built as wide column stores, using the concepts of both rows and columns. HBase和Cassandra使用行和列的概念构建为宽列存储。

A row is composed of a key similar to the concept of primary key in RDBMS and a value composed of several columns 一排构成的密钥类似的在主RDBMS关键字和一个的多个列构成的概念

A representation can be the following: 表示形式可以如下:

*******|    Key     |                   Value
-------+------------+-------------+------------------------------------------
Colunms|            |     name    |                 value
-------+------------+-------------+------------------------------------------
       |     a      |   title     | "Building a python graphdb in one night"
       |     b      |   body      | "You maybe already know that I am..."
       |     c      | publishedat |              "2015-08-23"
       |     d      |   name      |                database

       |     e      |   start     |                   1
       |     f      |    end      |                   2

            ...          ...                         ...

       |    u       |   title     |     "key/value store key composition"

            ...          ...                         ...

       |    x       |   title     |    "building a graphdb with HappyBase"

            ...          ...                         ...

Is it correct at the application layer , to build composed primary keys to allow to iterate quickly over colocated rows. 在应用程序层上构建正确主键以允许快速迭代位于同一行的行是否正确。

This can be reprensented as follow. 可以这样表示。

*******|           Key            |                 Value
-------+------------+-------------+------------------------------------------
Colunms| identifier |  name       |                 value
-------+------------+-------------+------------------------------------------
       |     1      |   title     | "Building a python graphdb in one night"
       |     1      |   body      | "You maybe already know that I am..."
       |     1      | publishedat |              "2015-08-23"
       |     2      |   name      |                database

       |     3      |   start     |                   1
       |     3      |    end      |                   2

            ...          ...                         ...

       |     4      |   title     |     "key/value store key composition"

            ...          ...                         ...

       |     42     |   title     |    "building a graphdb with HappyBase"

            ...          ...                         ...

The name column moved from the Value to the Key and Value has a single column name value . 从“ Value移到“ KeyValue ”的name列具有单个列名称value

Compound keys are used all the time when designing Cassandra schemas. 在设计Cassandra模式时,始终使用复合键。

In C*, the keys are broken down into two parts, the partition key, and clustering columns. 在C *中,键分为两部分,分区键和群集列。

The partition key is used to hash data to nodes within the cluster. 分区键用于将数据散列到群集中的节点。 A partition is a bucket of data that can hold a single row or multiple rows based on the clustering columns. 分区是一类数据桶,可以根据聚类列保存单行或多行。 Data within a partition is local to a node and is kept in sorted order by the clustering keys, which makes accessing data within a partition fast and efficient, with support for range queries on the clustering keys. 分区内的数据对于节点而言是本地的,并通过聚类键按排序顺序进行排序,从而支持对聚类键进行范围查询,从而可以快速高效地访问分区内的数据。

C* also allows data fields, which are not part of the compound key, and are not generally used in queries unless you create a secondary index on them. C *还允许数据字段,该字段不是复合键的一部分,并且除非在其上创建二级索引,否则通常不用于查询中。

The "wide column" terminology is a little outdated for C*. 对于C *,“宽列”术语有些过时。 In the current CQL view of things, data is thought of in more traditional terms as rows in a table, that are grouped into efficient to access partitions. 在当前的CQL事物视图中,以更传统的术语将数据视为表中的行,这些行被分组为有效的访问分区。

So to answer your question, yes in C* it is common to move columns that might have been thought of as data columns in an RDBMS to be part of the compound key in C*. 因此,要回答您的问题,是的,在C *中,通常将可能被视为RDBMS中的数据列的列移动为C *中复合键的一部分。

To see more information on partition keys and clustering columns, and how they impact the types of queries you can do, see a deep look at the CQL WHERE clause . 要查看有关分区键和集群列的更多信息,以及它们如何影响您可以执行的查询类型,请深入了解CQL WHERE子句

Composite keys are very popular in HBase schema design. 复合键在HBase模式设计中非常流行。 They also allow you to do fast range scans on prefix component of Rowkey. 它们还允许您对Rowkey的前缀组件进行快速范围扫描。 Unlike Cassandra, RowKey is not broken into Parts while storing data. 与Cassandra不同,在存储数据时,RowKey不会分解为Parts。

Simple example: http://riteshadval.blogspot.com/2012/03/hbase-composite-row-key-design-doing.html 简单示例: http : //riteshadval.blogspot.com/2012/03/hbase-composite-row-key-design-doing.html

In HBase, in your example, you will be able to do range scans with identifier only and with identifier+name also . 在HBase中,在您的示例中,您将能够range scans with identifier only and with identifier+name also进行range scans with identifier only and with identifier+name also

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM