简体   繁体   中英

KSQL create table with multi-column aggregation

So essentially I'd like to group by two columns, like this:

CREATE TABLE foo AS
SELECT a, b, SUM(a)
FROM whatever
GROUP BY a, b

whatever is a stream in Kafka format. When I issue the command, ksql returns:

Key format does not support schema.
format: KAFKA
schema: Persistence{columns=[`a` STRING KEY, `b` STRING KEY], features=[]}
reason: The 'KAFKA' format only supports a single field. Got: [`a` STRING KEY, `b` STRING KEY]
Caused by: The 'KAFKA' format only supports a single field. Got: [`DEVICE`
    a KEY, `b` STRING KEY] 

The problem is that the Kafka format does not support multi-column keys. Is there a way to workaround this, eg creating an artificial key in this table? I did not manage to do this.

I saw someone posted a similiar question and the answer seemed to work. I suggest that's because of the format. https://stackoverflow.com/a/50193239/9632703

The documentation mentions that multi-column aggregations might not work, though also saying that ksql does a background workaround to make it work. Unfortunately ksql only returns the given error message. https://www.confluent.de/blog/ksqldb-0-10-updates-key-columns/#multi-column-aggregations

The funny part is that omitting the first line CREATE TABLE foo AS works. So if some data comes in, the aggregation works. But that's not persistent, of course. If nothing else works, I would also be fine with having a table without a primary key defined, if possible in ksql, since I could still identify the data with {a, b} in my application.

Can someone help me? Thank you.

You can do this if you upgrade to ksqlDB 0.15. This version introduced multi-key support . You'll need to use a KEY_FORMAT that supports it.

ksql> CREATE TABLE FOO AS SELECT A, B, SUM(C) FROM TEST_STREAM GROUP BY A,B;

 Message
-----------------------------------
 Created query with ID CTAS_FOO_53
-----------------------------------
ksql> DESCRIBE FOO;

Name                 : FOO
 Field      | Type
---------------------------------------------
 A          | BIGINT           (primary key)
 B          | VARCHAR(STRING)  (primary key)
 KSQL_COL_0 | DOUBLE
---------------------------------------------
For runtime statistics and query details run: DESCRIBE EXTENDED <Stream,Table>;

ksql> SELECT * FROM FOO EMIT CHANGES LIMIT 5;
+---------------------------+---------------------------+---------------------------+
|A                          |B                          |KSQL_COL_0                 |
+---------------------------+---------------------------+---------------------------+
|220071000                  |AIS                        |0.4                        |
|257838000                  |AIS                        |6.2                        |
|538007854                  |AIS                        |22.700000000000003         |
|257487000                  |AIS                        |2.4                        |
|257601800                  |AIS                        |5.8999999999999995         |
Limit Reached
Query terminated

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM