简体   繁体   English

为什么Cassandra的超级专栏不再受青睐?

[英]Why are super columns in Cassandra no longer favoured?

I have read in the latest release that super columns are not desirable due to "performance issues", but no where is this explained. 我在最新版本中读到,由于“性能问题”,超级列不可取,但没有解释的地方。

Then I read articles such as this one that give wonderful indexing patterns using super columns. 然后我读了这篇文章,这些文章使用超级列提供了精彩的索引模式。

This leave me with no idea of what is currently the best way to do indexing in Cassandra. 这让我不知道目前在Cassandra做索引的最佳方法是什么。

  1. What are the performance issues of super columns? 超级列的性能问题是什么?
  2. Where can I find current best practices for indexing? 我在哪里可以找到当前的索引编制最佳实践?

Super columns suffer from a number of problems, not least of which is that it is necessary for Cassandra to deserialze all of the sub-columns of a super column when querying (even if the result will only return a small subset). 超级列存在许多问题,其中最重要的是Cassandra在查询时需要对超级列的所有子列进行反序列化(即使结果只返回一个小子集)。 As a result, there is a practical limit to the number of sub-columns per super column that can be stored before performance suffers. 结果,在性能受损之前可以存储的每个超级列的子列数存在实际限制。

In theory, this could be fixed within Cassandra by properly indexing sub-columns, but consensus is that composite columns are a better solution, and they work without the added complexity. 理论上,这可以通过适当地索引子列在Cassandra中修复,但是共识是复合列是更好的解决方案,并且它们在没有增加复杂性的情况下工作。

The easiest way to make use of composite columns is to take advantage of the abstraction that CQL 3 provides. 使用复合列的最简单方法是利用CQL 3提供的抽象。 Consider the following schema: 请考虑以下架构:

CREATE TABLE messages(
    username text,
    sent_at timestamp,
    message text,
    sender text,
    PRIMARY KEY(username, sent_at)
);

Username here is the row key, but we've used a PRIMARY KEY definition which creates a grouping of row key and the sent_at column. 这里的用户名是行键,但是我们使用了PRIMARY KEY定义,它创建了一个行键和sent_at列的分组。 This is important as it has the effect of indexing that attribute. 这很重要,因为它具有索引该属性的效果。

INSERT INTO messages (username, sent_at, message, sender) VALUES ('bob', '2012-08-01 11:42:15', 'Hi', 'alice');
INSERT INTO messages (username, sent_at, message, sender) VALUES ('alice', '2012-08-01 11:42:37', 'Hi yourself', 'bob');
INSERT INTO messages (username, sent_at, message, sender) VALUES ('bob', '2012-08-01 11:43:00', 'What are you doing later?', 'alice');
INSERT INTO messages (username, sent_at, message, sender) VALUES ('bob', '2012-08-01 11:47:14', 'Bob?', 'alice');

Behind the scenes Cassandra will store the above inserted data something like this: 在幕后,Cassandra将存储上面插入的数据,如下所示:

alice: (2012-08-01 11:42:37,message): Hi yourself, (2012-08-01 11:42:37,sender): bob
bob:   (2012-08-01 11:42:15,message): Hi,          (2012-08-01 11:42:15,sender): alice, (2012-08-01 11:43:00,message): What are you doing later?, (2012-08-01 11:43:00,sender): alice (2012-08-01 11:47:14,message): Bob?, (2012-08-01 11:47:14,sender): alice

But using CQL 3, we can query the "row" using a sent_at predicate, and get back a tabular result set. 但是使用CQL 3,我们可以使用sent_at谓词查询“行”,并返回表格结果集。

SELECT * FROM messages WHERE username = 'bob' AND sent_at > '2012-08-01';
 username | sent_at                  | message                   | sender
----------+--------------------------+---------------------------+--------
      bob | 2012-08-01 11:43:00+0000 | What are you doing later? |  alice
      bob | 2012-08-01 11:47:14+0000 |                      Bob? |  alice

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM