简体   繁体   中英

How does window tumbling works in ksql? As query returning same result with or without using window tumbling in ksql

I am using ksql stream and calculating events coming every 5 minutes. Here is my query -

select count(*), created_on_date from TABLE_NAME window tumbling (size 5 minutes) group by created_on_date;

Providing results -

2 | 2018-11-13 09:54:50
3 | 2018-11-13 09:54:49
3 | 2018-11-13 09:54:52
3 | 2018-11-13 09:54:51
3 | 2018-11-13 09:54:50

query without window tumbling -

select count(*), created_on_date from OP_UPDATE_ONLY group by created_on_date;

Result -

1 | 2018-11-13 09:55:08
2 | 2018-11-13 09:55:09
1 | 2018-11-13 09:55:10
3 | 2018-11-13 09:55:09
4 | 2018-11-13 09:55:12

Both queries returning same results, so how does window tumbling make difference?

The tumbling window is a rolling aggregation and counts the number of events based on a key within a given window of time. The window of time is based on the timestamp of your stream, inherited from your Kafka message by default but overrideable by WITH (TIMESTAMP='my_column') . So you could pass created_on_date as the timestamp column and then aggregate by the values there.

The second one is over the entire stream of messages. Since you happen to have a timestamp in your message itself, grouping by that gives the illusion of a time-based aggregation. However, if you wanted to find out how many events, for example, within an hour - this would be no use (you can only do a count at the grain of created_on_date ).

So the first example, with a window, is usually the correct way to do it because you usually want to answer a business question about an aggregation within a given time period , not over the course of an arbitrary stream of data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM