简体   繁体   English

cassandra 中的 select 不同计数的方法是什么?

[英]What are the ways to select distinct count in cassandra?

I need to select distinct count in table in cassandra.我需要 cassandra 表中的 select 不同计数。

As I understand direct distinct count is not supported in cassandra not even nested queries like rdbms.据我了解,cassandra 不支持直接不同计数,甚至不支持像 rdbms 这样的嵌套查询。

select count(*) from (select distinct key_part_one from stackoverflow_composite) as count;

SyntaxException: line 1:21 no viable alternative at input '(' (select count(*) from [(]...) SyntaxException:第 1:21 行在输入 '(' 处没有可行的替代方案 (select count(*) from [(]...)

What are the ways to get it.有什么方法可以得到。 whether I can get directly from cassandra or any addon tools/languages need to be used?我是否可以直接从 cassandra 获得或需要使用任何插件工具/语言?

below is my create table statement.下面是我的创建表语句。

CREATE TABLE nishant_ana.ais_profile_table (
    profile_key text,
    profile_id text,
    last_update_day date,
    last_transaction_timestamp timestamp,
    last_update_insertion_timestamp timeuuid,
    profile_data blob,
    PRIMARY KEY ((profile_key, profile_id), last_update_day)
) WITH CLUSTERING ORDER BY (last_update_day DESC)
    AND bloom_filter_fp_chance = 0.01
    AND caching = {'keys': 'ALL', 'rows_per_partition': 'NONE'}
    AND comment = ''
    AND compaction = {'class': 'org.apache.cassandra.db.compaction.SizeTieredCompactionStrategy', 'max_threshold': '32', 'min_threshold': '4'}
    AND compression = {'chunk_length_in_kb': '64', 'class': 'org.apache.cassandra.io.compress.LZ4Compressor'}
    AND crc_check_chance = 1.0
    AND dclocal_read_repair_chance = 0.1
    AND default_time_to_live = 0
    AND gc_grace_seconds = 864000
    AND max_index_interval = 2048
    AND memtable_flush_period_in_ms = 0
    AND min_index_interval = 128
    AND read_repair_chance = 0.0
    AND speculative_retry = '99PERCENTILE';

I have just started using cassandra.我刚刚开始使用 cassandra。

From Cassandra you can only do the select distinct partition_key from... .从 Cassandra 你只能做select distinct partition_key from...

If you need something like this, you can use Spark + Spark Cassandra Connector - it will work, but don't expect really real-time answers, as it needs to read necessary data from all nodes, and then calculate answer.如果你需要这样的东西,你可以使用 Spark + Spark Cassandra 连接器——它会工作,但不要指望真正的实时答案,因为它需要从所有节点读取必要的数据,然后计算答案。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM