简体   繁体   中英

Rowkey as Concatenated in Create Table from a Stream in ksqlDB

The stream is:

CREATE STREAM SENSORS_KSTREAM (sensorid INT,
  serialnumber VARCHAR,
  mfgdate VARCHAR,
  productname VARCHAR,
  customerid INT,
  locationid INT,
  macaddress VARCHAR,
  installationdate VARCHAR)
WITH (KAFKA_TOPIC='SENSORS_DETAILS', VALUE_FORMAT='AVRO', KEY='sensorid');

the table I created with this is:

CREATE TABLE SENSORS_KTABLE AS
SELECT sensorid, serialnumber, mfgdate, productname, customerid, locationid, macaddress, installationdate, COUNT(*) AS TOTAL 
FROM SENSORS_KSTREAM WINDOW TUMBLING (SIZE 1 MINUTES) 
GROUP BY sensorid, serialnumber, mfgdate, productname, customerid, locationid, macaddress, installationdate;

在此处输入图像描述

The ROWKEY produced is not what I want.

I want only SENSORID as the rowkey.

Can anyone help me do this.

Thanks in advance.

PS: I am using Confluent 5.4.0 standalone.

ksqlDB stores the primary key of a table in the key of the underlying Kafka message. This is crucial to ensure important things like consistent partition assignment for the same key, and log compaction.

ksqlDB does not support compound keys, though this is a feature being worked on. So in the meantime, when you group by multiple columns, ksqlDB does the best it can and builds the compound key you're encountered. Not great, but it actually works for many use cases.

The statement you have above is creating a table with many columns in the primary key - and they're all currently getting serialized into a single STRING value.

You're asking to only having SENSORID in the key... but your GROUP BY clause makes all the columns that come after part of the key.

It seems to me that you have a topic that contains a stream of updated values for sensors. That being the case I would suggest looking into two options:

  1. If each row in your input topic contains all the data for each sensor, then why not just import it as a TABLE rather than a STREAM:
CREATE TABLE SENSORS_KSTREAM (sensorid INT,
  serialnumber VARCHAR,
  mfgdate VARCHAR,
  productname VARCHAR,
  customerid INT,
  locationid INT,
  macaddress VARCHAR,
  installationdate VARCHAR)
WITH (KAFKA_TOPIC='SENSORS_DETAILS', VALUE_FORMAT='AVRO', KEY='sensorid');
  1. Alternatively, maybe LATEST_BY_OFFSET might be of use to capture the latest value for each column:
CREATE TABLE SENSORS_KTABLE AS
SELECT sensorid, LATEST_BY_OFFSET(serialnumber), LATEST_BY_OFFSET(mfgdate), LATEST_BY_OFFSET(productname), LATEST_BY_OFFSET(customerid), LATEST_BY_OFFSET(locationid), LATEST_BY_OFFSET(macaddress), LATEST_BY_OFFSET(installationdate) 
FROM SENSORS_KSTREAM WINDOW TUMBLING (SIZE 1 MINUTES) 
GROUP BY sensorid;

LAST_BY_OFFSET was only introduced a couple of releases ago, so you may need to update.

Hopefully these two options will help you get where you need to be.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM