简体   繁体   English

从 ksqlDB 中的 Stream 创建表中连接的 Rowkey

[英]Rowkey as Concatenated in Create Table from a Stream in ksqlDB

The stream is: stream 是:

CREATE STREAM SENSORS_KSTREAM (sensorid INT,
  serialnumber VARCHAR,
  mfgdate VARCHAR,
  productname VARCHAR,
  customerid INT,
  locationid INT,
  macaddress VARCHAR,
  installationdate VARCHAR)
WITH (KAFKA_TOPIC='SENSORS_DETAILS', VALUE_FORMAT='AVRO', KEY='sensorid');

the table I created with this is:我用这个创建的表是:

CREATE TABLE SENSORS_KTABLE AS
SELECT sensorid, serialnumber, mfgdate, productname, customerid, locationid, macaddress, installationdate, COUNT(*) AS TOTAL 
FROM SENSORS_KSTREAM WINDOW TUMBLING (SIZE 1 MINUTES) 
GROUP BY sensorid, serialnumber, mfgdate, productname, customerid, locationid, macaddress, installationdate;

在此处输入图像描述

The ROWKEY produced is not what I want.产生的ROWKEY不是我想要的。

I want only SENSORID as the rowkey.我只想要SENSORID作为行键。

Can anyone help me do this.谁能帮我做到这一点。

Thanks in advance.提前致谢。

PS: I am using Confluent 5.4.0 standalone. PS:我正在使用独立的 Confluent 5.4.0。

ksqlDB stores the primary key of a table in the key of the underlying Kafka message. ksqlDB 将表的主键存储在底层 Kafka 消息的键中。 This is crucial to ensure important things like consistent partition assignment for the same key, and log compaction.这对于确保重要的事情(例如相同键的一致分区分配和日志压缩)至关重要。

ksqlDB does not support compound keys, though this is a feature being worked on. ksqlDB 不支持复合键,尽管这是一个正在开发的功能。 So in the meantime, when you group by multiple columns, ksqlDB does the best it can and builds the compound key you're encountered.因此,与此同时,当您按多列分组时,ksqlDB 会尽其所能并构建您遇到的复合键。 Not great, but it actually works for many use cases.不是很好,但它实际上适用于许多用例。

The statement you have above is creating a table with many columns in the primary key - and they're all currently getting serialized into a single STRING value.您上面的语句是在主键中创建一个包含许多列的表 - 它们当前都被序列化为单个 STRING 值。

You're asking to only having SENSORID in the key... but your GROUP BY clause makes all the columns that come after part of the key.您要求仅在键中SENSORID ...但是您的 GROUP BY 子句使所有列在键的一部分之后。

It seems to me that you have a topic that contains a stream of updated values for sensors.在我看来,您有一个包含传感器更新值的 stream 的主题。 That being the case I would suggest looking into two options:在这种情况下,我建议研究两个选项:

  1. If each row in your input topic contains all the data for each sensor, then why not just import it as a TABLE rather than a STREAM:如果输入主题中的每一行都包含每个传感器的所有数据,那么为什么不将其作为 TABLE 而不是 STREAM 导入:
CREATE TABLE SENSORS_KSTREAM (sensorid INT,
  serialnumber VARCHAR,
  mfgdate VARCHAR,
  productname VARCHAR,
  customerid INT,
  locationid INT,
  macaddress VARCHAR,
  installationdate VARCHAR)
WITH (KAFKA_TOPIC='SENSORS_DETAILS', VALUE_FORMAT='AVRO', KEY='sensorid');
  1. Alternatively, maybe LATEST_BY_OFFSET might be of use to capture the latest value for each column:或者,也许LATEST_BY_OFFSET可能用于捕获每列的最新值:
CREATE TABLE SENSORS_KTABLE AS
SELECT sensorid, LATEST_BY_OFFSET(serialnumber), LATEST_BY_OFFSET(mfgdate), LATEST_BY_OFFSET(productname), LATEST_BY_OFFSET(customerid), LATEST_BY_OFFSET(locationid), LATEST_BY_OFFSET(macaddress), LATEST_BY_OFFSET(installationdate) 
FROM SENSORS_KSTREAM WINDOW TUMBLING (SIZE 1 MINUTES) 
GROUP BY sensorid;

LAST_BY_OFFSET was only introduced a couple of releases ago, so you may need to update. LAST_BY_OFFSET 仅在几个版本前引入,因此您可能需要更新。

Hopefully these two options will help you get where you need to be.希望这两个选项将帮助您到达您需要的地方。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM