简体   繁体   English

KSQL - 在 WINDOW TUMBLING 子句中更改时区

[英]KSQL - Change the time zone in WINDOW TUMBLING clause

Here my KSQL using WINDOW TUMBLING clause:这里我的 KSQL 使用WINDOW TUMBLING子句:

SELECT 
    sale_date,
    region,
    SUM(total)
FROM orders
WINDOW TUMBLING (SIZE 24 HOURS)
GROUP BY sale_date, region;

Some result:一些结果:

2018-09-29|+|zskx_fz : Window{start=1538179200000 end=-} | 2018-09-29 | zskx_fz | 16119.8
2018-09-30|+|zskx_fz : Window{start=1538179200000 end=-} | 2018-09-30 | zskx_fz | 2031.6
2018-09-30|+|zskx_fz : Window{start=1538265600000 end=-} | 2018-09-30 | zskx_fz | 894.7

And the epoch millis to date time is:到日期时间的纪元毫秒是:

1538179200000 = 2018-09-29 08:00:00 (UTC+8)
1538265600000 = 2018-09-30 08:00:00 (UTC+8)

As we can see, I'm in UTC+8.正如我们所看到的,我在 UTC+8。 But regardless the time zone, start date time should be 2018-09-29 00:00:00 not 8 hours earlier.但是无论时区如何, start日期时间都应该是2018-09-29 00:00:00而不是 8 小时前。 So it's able to change the time zone?那么它能够改变时区吗?

PS: I tried out several window size at 2018-09-30 11:33:00 and I totally lost.. PS:我在2018-09-30 11:33:00尝试了几个窗口大小,我完全输了..

WINDOW TUMBLING (SIZE 1 minutes)    2018-09-30 11:32:00
WINDOW TUMBLING (SIZE 2 hours)      2018-09-30 10:00:00
WINDOW TUMBLING (SIZE 5 hours)      2018-09-30 07:00:00
WINDOW TUMBLING (SIZE 10 hours)     2018-09-30 02:00:00
WINDOW TUMBLING (SIZE 11 hours)     2018-09-30 07:00:00
WINDOW TUMBLING (SIZE 12 hours)     2018-09-30 08:00:00
WINDOW TUMBLING (SIZE 24 hours)     2018-09-30 08:00:00

Timestamp windows are always calculated relative to the epoch, which is UTC/GMT.时间戳窗口总是相对于纪元计算,即 UTC/GMT。

I can see the validity of wanting to aggregate by day based on your timezone.我可以看到想要根据您的时区按天聚合的有效性。 I've raised it as an issue on the KSQL github project , and suggest you track it there.我已将其作为KSQL github 项目上的问题提出,并建议您在那里对其进行跟踪。

If you are only using tumbling window you can consider time as just another dimension and perform aggregations over this dimension and not use any windowing at all.如果您只使用滚动窗口,您可以将时间视为另一个维度,并在该维度上执行聚合,而根本不使用任何窗口。 Here is an example.这是一个例子。 Let's consider the input stream schema is as the following:让我们考虑输入流模式如下:

<sale_date BIGINT, region VARCHAR, total DOUBLE>

Assuming the sale_date is the timestamp of the sale and our local time is PST, we can use the TIMESTAMPTOSTRING function to extract different time granularities for each sale for a given timezone as the following:假设sale_date是销售的时间戳,我们的本地时间是 PST,我们可以使用TIMESTAMPTOSTRING函数为给定时区的每个销售提取不同的时间粒度,如下所示:

CREATE STREAM foo AS SELECT TIMESTAMPTOSTRING(sale_date, 'yyyy-MM-dd HH', 'PST') AS sale_hour, TIMESTAMPTOSTRING(sale_date, 'yyyy-MM-dd', 'PST') AS sale_day, TIMESTAMPTOSTRING(sale_date, 'yyyy-MM', 'PST') AS sale_month, region, total FROM orders; Now, you should be able to write your aggregate queries over this stream.现在,您应该能够通过此流编写聚合查询。 For instance for daily sales for each region you can write the following query:例如,对于每个地区的每日销售额,您可以编写以下查询:

CRAETE TABLE daily_sale AS SELECT sale_day, region, sum(total) FROM foo GROUP BY sale_day, region;

Note that you don't need to specify a window for the above query.请注意,您不需要为上述查询指定窗口。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM