简体   繁体   English

如何获得Cassandra集大小?

[英]How to get Cassandra set size?

I want to store info about some events in Cassandra. 我想在Cassandra中存储有关某些事件的信息。 Events have different groups and also grouped by time interval (group id = partition key, interval = clustering key). 事件具有不同的组,并且也按时间间隔分组(组ID =分区键,时间间隔=群集键)。 Events has id and inside every group I want to store only events with unique id inside this group. 事件具有ID,并且在每个组中我只想在该组中存储具有唯一ID的事件。 I think to use sets for it and store event id in them. 我认为为此使用集并在其中存储事件ID。 Something like this: 像这样:

group id (PK) | time (CK) | event ids
1             | 13:00     | {0, 2, 4, 5}
1             | 14:00     | {1, 3}
1             | 15:00     | {}
2             | 13:00     | {}
2             | 14:00     | {2, 4}

When I do select request I want to get events count for special group inside some time range. 当我选择请求时,我想在某个时间范围内获取特殊组的事件计数。 It will be next for table above and group with id 1 for time range 13:00 - 15:00 : 上表将在下一个,时间范围为13:00 - 15:00 : 13:00 - 15:00 ID为1组:

13:00 - 4
14:00 - 2
15:00 - 0

I can select all events sets for group 1 for time range 13:00 - 15:00 and calculate their side. 我可以为时间范围13:00 - 15:00选择组1所有事件集,并计算其边。 It will works but events set can be large enough and I don't need info about event ids (I store it only for uniqueness), only their size. 它将起作用,但是事件集可以足够大,并且我不需要有关事件ID的信息(我仅出于唯一性而存储它),而无需它们的大小。 Can I get sets sizes on Cassandra side using CQL? 我可以使用CQL在Cassandra端获取集大小吗?

Don't use collection for huge data 不要将收集用于大数据

Collection (Set): collection size: 2B (231); 集合(Set):集合大小:2B(231); values size: 65535 (216-1) (Cassandra 2.1 and later, using native protocol v3) 值大小:65535(216-1)(Cassandra 2.1及更高版本,使用本机协议v3)

Instead put event_id in the primary key. 而是将event_id放在主键中。

CREATE TABLE events(
    group_id bigint,
    time bigint,
    event_id bigint,
    PRIMARY KEY(group_id,time,event_id)
);

You can insert data like this one : 您可以像这样插入数据:

INSERT INTO events (group_id , time , event_id ) VALUES ( 1, 13, 0);

And you can query like this one : 您可以像这样查询:

SELECT * FROM events WHERE group_id = 1;

It will return all the event in a group. 它将以组的形式返回所有事件。

group_id | time | event_id
----------+------+----------
        1 |   13 |        0
        1 |   13 |        1
        1 |   14 |        2

Use Spark or Write program to Find the group by count. 使用Spark或Write程序按计数查找组。

Or use any one of these query to get count. 或使用这些查询中的任何一个进行计数。

SELECT group_id,time,count(*) FROM events WHERE group_id = 1 AND time = 13; // To count in a group and time
SELECT group_id,time,count(*) FROM events WHERE group_id = 1 AND time >= 13 AND time <= 14; // To count in a group between time 13 to 14.

Source : https://docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html 来源: https : //docs.datastax.com/en/cql/3.1/cql/cql_reference/refLimits.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM