简体   繁体   English

KSQL:支持 KSQL 表的主题未压缩

[英]KSQL: Topic backing a KSQL table not get compacted

I am using KSQL to track the delay between stops for a fleet management system, for simplicity I have 2 streams trips and tasks , they get their data feed from debezium, so far so good.我正在使用 KSQL 来跟踪车队管理系统的停靠站之间的延迟,为简单起见,我有 2 个流tripstasks ,他们从 debezium 获取数据馈送,到目前为止一切都很好。

My problem is when I create a KSQL table that reflects some aggregated data, I assume that the backing topic should eventually have a compacted results, but in fact it is not, as in the below example我的问题是当我创建一个反映一些聚合数据的 KSQL 表时,我假设支持主题最终应该有一个压缩的结果,但实际上它不是,如下例所示

-- trips stream
CREATE STREAM trips_raw (
            id bigint, gross_merchandise_value double, vehicle_id bigint, trip_code string,
            status string,time_slot string,  number_of_orders integer, supplier_id integer, 
            trip_start_time bigint, agent_id integer, trip_number integer,  returnes_handled BOOLEAN,
            modification_date bigint, created_by integer, modified_by integer, creation_date bigint
            )
WITH (KAFKA_TOPIC='trips', VALUE_FORMAT='json');

--tasks stream
CREATE STREAM tasks_raw (id bigint, delivery_trip_id bigint, agent_id integer, creation_date bigint, 
                                modification_date bigint, 
                                status string, created_by integer, modified_by integer, request_id bigint) 
WITH (KAFKA_TOPIC='tasks',VALUE_FORMAT='json');

-- THE AGGREGATED TABLE (just simple view for sake of simplicity) 
create table trips_actions_count as
    select count(1), ID from trips_raw
    group by ID;


----- TEST DATA ------
 INSERT INTO trips_raw (
    id, gmv, vehicle_id , trip_code, status, trip_start_time , MODIFICATION_DATE, CREATED_BY, MODIFIED_BY,CREATION_DATE
) VALUES (
    1, 100.5, 523, 'TRIP_1', 'CREATED', 1616480285000, 1616530285000, 123, 123, 1616444781000
);

 INSERT INTO trips_raw (
    id, gmv, vehicle_id , trip_code, status, trip_start_time , MODIFICATION_DATE, CREATED_BY, MODIFIED_BY,CREATION_DATE
) VALUES (
    1, 100.5, 523, 'TRIP_1', 'ARRIVED', 1616480285000, 1616540285000, 123, 123, 1616444781000
);

 INSERT INTO trips_raw (
    id, gmv, vehicle_id , trip_code, status, trip_start_time , MODIFICATION_DATE, CREATED_BY, MODIFIED_BY,CREATION_DATE
) VALUES (
    1, 100.5, 523, 'TRIP_1', 'COMPLETED', 1616480285000, 1616550285000, 123, 123, 1616444781000
);

When I tail the created topic backing the table TRIPS_ACTIONS_COUNT , I got the below results,当我尾随支持表TRIPS_ACTIONS_COUNT的创建主题时,我得到以下结果,

kafka-console-consumer --bootstrap-server localhost:9092 --topic TRIPS_ACTIONS_COUNT --from-beginning
{"KSQL_COL_0":1}
{"KSQL_COL_0":2}
{"KSQL_COL_0":3} 

kafka-topics --bootstrap-server localhost:9092 --describe --topics-with-overrides --topic TRIPS_ACTIONS_COUNT
Topic: TRIPS_ACTIONS_COUNT  PartitionCount: 3   ReplicationFactor: 1    Configs: cleanup.policy=compact,segment.bytes=1073741824

I assume that the TRIPS_ACTIONS_COUNT should be compacted, so that when a consumer read it, it should only get the latest value for a specific key, which is in my case {"KSQL_COL_0":3} .我假设TRIPS_ACTIONS_COUNT应该被压缩,这样当消费者阅读它时,它应该只获得特定键的最新值,在我的例子中是{"KSQL_COL_0":3}

I think I am missing something, not sure what is it yet?我想我错过了一些东西,不知道它是什么?

You're only printing the values, whereas compaction happens on topic record keys (you need to add --property print.key=true ), and it only happens on closed segments, where the default segment size is 1G... In other words, 3 records is not enough for compaction to occur, anyway您只打印值,而压缩发生在主题记录键上(您需要添加--property print.key=true ),并且它仅发生在封闭段上,其中默认段大小为 1G... 在其他也就是说,无论如何,3 条记录不足以进行压缩

In general, the issue is that you're consuming a stream (a raw topic), which is the changelog of events that happen within the table.通常,问题在于您正在使用 stream(原始主题),这是表中发生的事件的更改日志。 You should instead be selecting from your table with the ksql cli if you truly want to see the grouped data如果您真的想查看分组数据,您应该使用 ksql cli 从表中进行选择

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM