简体   繁体   中英

Data in ksql table not being persistent

We are using confluent platform on ubuntu. We have simple JSON data being sent through a cURL request to kafka-rest server on kafka topic named "UE_Context".

A kafka stream named "UE_CONTEXT_STREAM" is created for this topic with the below command:

CREATE STREAM UE_Context_Stream (ue_key VARCHAR, ecgi VARCHAR) WITH (KAFKA_TOPIC='UE_Context', VALUE_FORMAT='JSON');

A kafka table named "UE_CONTEXT_TABLE" is created for this topic with the below command:

CREATE TABLE UE_Context_Table ( registertime BIGINT, ue_key VARCHAR, ecgi VARCHAR) WITH (KAFKA_TOPIC='UE_Context', KEY='ue_key', VALUE_FORMAT='JSON');

I have two rows of data being pumped on topic using the below cURL commands:

curl -X POST -H "Accept: application/json" -H "Content-Type: application/vnd.kafka.json.v1+json" --data '{"records":[{"key": "0x1234", "value":{"ue_key": "0x1234", "ecgi" : "1234"}}]}' "http://localhost:8082/topics/UE_Context"  
curl -X POST -H "Accept: application/json" -H "Content-Type: application/vnd.kafka.json.v1+json" --data '{"records":[{"key": "0x1234", "value":{"ue_key": "0x4321", "ecgi" : "4321"}}]}' "http://localhost:8082/topics/UE_Context"      

I have a select query waiting on table as below:

ksql查询

This query displays the table info when JSON data is pumped into the topic. We then stop pumping JSON data into the topic and end the select query and end the select query. If a select is performed at a later point of time, the previously populated table info is not displayed. Is there no way to persist this data? Kafka connectors and using a DB might be an option. But does kSQL not have transient memory to store the table info?

a select is performed at a later point of time, the previously populated table info is not displayed.

A select statement defaults to the latest offsets of the topic

If you want to see previous data, you need to set the consumer offset back to the beginning

SET 'auto.offset.reset'='earliest';

Also, as mentioned in the documentation (with emphasis)

A SELECT statement by itself is a non-persistent continuous query. The result of a SELECT statement isn't persisted in a Kafka topic and is only printed in the KSQL console. Don't confuse persistent queries created by CREATE STREAM AS SELECT with the streaming query result from a SELECT statement.

As stated in ksql github README:

ksqlDB allows you to define materialized views over your streams and tables. Materialized views are defined by what is known as a "persistent query". These queries are known as persistent because they maintain their incrementally updated results using a table.

Now, there's a lot more information on Materialized View than it is on persistent queries, so just read on :

The benefit of a materialized view is that it evaluates a query on the changes only (the delta), instead of evaluating the query on the entire table. ...

In ksqlDB, a table can be materialized into a view or not. If a table is created directly on top of a Kafka topic, it's not materialized. Non-materialized tables can't be queried, because they would be highly inefficient. On the other hand, if a table is derived from another collection, ksqlDB materializes its results, and you can make queries against it.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM