KSQL - 在 WINDOW TUMBLING 子句中更改时区

Question

Here my KSQL using WINDOW TUMBLING clause:这里我的 KSQL 使用WINDOW TUMBLING子句：

SELECT 
    sale_date,
    region,
    SUM(total)
FROM orders
WINDOW TUMBLING (SIZE 24 HOURS)
GROUP BY sale_date, region;

Some result:一些结果：

2018-09-29|+|zskx_fz : Window{start=1538179200000 end=-} | 2018-09-29 | zskx_fz | 16119.8
2018-09-30|+|zskx_fz : Window{start=1538179200000 end=-} | 2018-09-30 | zskx_fz | 2031.6
2018-09-30|+|zskx_fz : Window{start=1538265600000 end=-} | 2018-09-30 | zskx_fz | 894.7

And the epoch millis to date time is:到日期时间的纪元毫秒是：

1538179200000 = 2018-09-29 08:00:00 (UTC+8)
1538265600000 = 2018-09-30 08:00:00 (UTC+8)

As we can see, I'm in UTC+8.正如我们所看到的，我在 UTC+8。 But regardless the time zone, start date time should be 2018-09-29 00:00:00 not 8 hours earlier.但是无论时区如何， start日期时间都应该是2018-09-29 00:00:00而不是 8 小时前。 So it's able to change the time zone?那么它能够改变时区吗？

PS: I tried out several window size at 2018-09-30 11:33:00 and I totally lost.. PS：我在2018-09-30 11:33:00尝试了几个窗口大小，我完全输了..

WINDOW TUMBLING (SIZE 1 minutes)    2018-09-30 11:32:00
WINDOW TUMBLING (SIZE 2 hours)      2018-09-30 10:00:00
WINDOW TUMBLING (SIZE 5 hours)      2018-09-30 07:00:00
WINDOW TUMBLING (SIZE 10 hours)     2018-09-30 02:00:00
WINDOW TUMBLING (SIZE 11 hours)     2018-09-30 07:00:00
WINDOW TUMBLING (SIZE 12 hours)     2018-09-30 08:00:00
WINDOW TUMBLING (SIZE 24 hours)     2018-09-30 08:00:00

Answer 1

Timestamp windows are always calculated relative to the epoch, which is UTC/GMT.时间戳窗口总是相对于纪元计算，即 UTC/GMT。

I can see the validity of wanting to aggregate by day based on your timezone.我可以看到想要根据您的时区按天聚合的有效性。 I've raised it as an issue on the KSQL github project , and suggest you track it there.我已将其作为KSQL github 项目上的问题提出，并建议您在那里对其进行跟踪。

Answer 2

If you are only using tumbling window you can consider time as just another dimension and perform aggregations over this dimension and not use any windowing at all.如果您只使用滚动窗口，您可以将时间视为另一个维度，并在该维度上执行聚合，而根本不使用任何窗口。 Here is an example.这是一个例子。 Let's consider the input stream schema is as the following:让我们考虑输入流模式如下：

<sale_date BIGINT, region VARCHAR, total DOUBLE>

Assuming the sale_date is the timestamp of the sale and our local time is PST, we can use the TIMESTAMPTOSTRING function to extract different time granularities for each sale for a given timezone as the following:假设sale_date是销售的时间戳，我们的本地时间是 PST，我们可以使用TIMESTAMPTOSTRING函数为给定时区的每个销售提取不同的时间粒度，如下所示：

CREATE STREAM foo AS SELECT TIMESTAMPTOSTRING(sale_date, 'yyyy-MM-dd HH', 'PST') AS sale_hour, TIMESTAMPTOSTRING(sale_date, 'yyyy-MM-dd', 'PST') AS sale_day, TIMESTAMPTOSTRING(sale_date, 'yyyy-MM', 'PST') AS sale_month, region, total FROM orders; Now, you should be able to write your aggregate queries over this stream.现在，您应该能够通过此流编写聚合查询。 For instance for daily sales for each region you can write the following query:例如，对于每个地区的每日销售额，您可以编写以下查询：

CRAETE TABLE daily_sale AS SELECT sale_day, region, sum(total) FROM foo GROUP BY sale_day, region;

Note that you don't need to specify a window for the above query.请注意，您不需要为上述查询指定窗口。

KSQL - 在 WINDOW TUMBLING 子句中更改时区

问题描述

2 个解决方案

解决方案1
1 2018-10-01 09:28:48

解决方案2
1 2018-10-08 23:48:14

KSQL - 在 WINDOW TUMBLING 子句中更改时区

问题描述

2 个解决方案

解决方案1 1 2018-10-01 09:28:48

解决方案2 1 2018-10-08 23:48:14

解决方案1
1 2018-10-01 09:28:48

解决方案2
1 2018-10-08 23:48:14