简体   繁体   中英

How KSQL Windowed query works and maximum window size

I have two questions regarding querying in KSQL wrt queries that use windowing :

  1. Let's say I have the following aggregation query :

    SELECT id, COUNT(*) FROM testtopic_stream WINDOW TUMBLING (SIZE 30 DAYS) GROUP BY id;

Are the results of the aggregation above calculated by only using the new tick that comes in OR it actually will go through all the data for last 30 days and then perform the aggregation?

  1. What is the maximum possible window size for queries? I see I am able to set up a window for even like 30 days and the query seems to work fine now. Is there a recommended maximum window size?

It depends on auto.offset.reset strategy. If you set it to "earliest" , the query will consumer all data from the underlying stream/topic (note, that "all" means really all data that is stored in the topic, ie, it depends on topic retention setting how much data this will be). If you set the config to "latest" -- what is the default -- the query will only process data that is written by upstream producers after the query was started.

In both cases, the size of the window has no impact on what data will be processed.

There is no limit on the window size. You can pick any size you want. Note: for tumbling windows, a smaller window size in fact increases storage requirement while a larger window sizes reduces storage requirement because there are fewer windows that need to be maintained in parallel.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM