简体   繁体   English

如何根据 Presto 中的日期间隔取消嵌套表?

[英]How to unnest the table based on date interval in Presto?

I have a table with events, where start_dt is a start and end_dt an end of the event.我有一个事件表,其中start_dt是事件的开始,而end_dt是事件的结束。 This table is partitioned by dt column derived from the end_dt .该表由派生自end_dt dt列分区。 This means that events that start before and end after midnight are present only in one partition.这意味着在午夜之前开始并在午夜之后结束的事件仅存在于一个分区中。 What I need to do is to split each event into as many rows as the number of dates the event was associated with.我需要做的是将每个事件分成与事件关联的日期数一样多的行。 Is there any smart way to achieve it using the Presto SQL syntax?有没有什么聪明的方法可以使用 Presto SQL 语法来实现它?

Input:输入:

  id  | start_dt                 | end_dt                  | dt
------+--------------------------+-------------------------+----------
 1    | 2020-09-24 21:56:12.669  | 2020-09-25 00:26:16.440 | 2020-09-25
 2    | 2020-09-25 17:12:02.699  | 2020-09-25 17:42:02.699 | 2020-09-25
 3    | 2020-09-23 23:47:29.146  | 2020-09-25 00:17:29.146 | 2020-09-25

Expected output:预期输出:

  id  | start_dt                 | end_dt                  | dt
------+--------------------------+-------------------------+----------
 1    | 2020-09-24 21:56:12.669  | 2020-09-24 23:59:59.999 | 2020-09-24
 1    | 2020-09-25 00:00:00.001  | 2020-09-25 00:26:16.440 | 2020-09-25
 2    | 2020-09-25 17:12:02.699  | 2020-09-25 17:42:02.699 | 2020-09-25
 3    | 2020-09-23 23:47:29.146  | 2020-09-23 23:59:59.999 | 2020-09-23
 3    | 2020-09-24 00:00:00.001  | 2020-09-24 23:59:59.999 | 2020-09-24
 3    | 2020-09-25 00:00:00.001  | 2020-09-25 00:17:29.146 | 2020-09-25

In Presto, you can use sequence() to generate an array of dates.在 Presto 中,您可以使用sequence()生成日期数组。 The rest is just unnesting and conditional logic:其余的只是取消嵌套和条件逻辑:

select t.id, 
    case when date(t.start_dt) = s.dt then t.start_dt else cast(s.dt as timestamp)                    end as new_start_dt,
    case when date(t.end_dt)   = s.dt then t.end_dt   else cast(s.dt as timestamp) + interval '1' day end as new_end_dt,
    s.dt
from mytable t
cross join unnest(sequence(date(t.start_dt), date(t.end_dt))) as s(dt)

Note that this generates dates intervals that start and end at midnight exactly: the half-open intervals logic makes more sense to me than the removing or adding a millisecond here and there.请注意,这会生成恰好在午夜开始和结束的日期间隔:半开间隔逻辑对我来说比在这里和那里删除或添加毫秒更有意义。 You can easily change that if you like.如果您愿意,您可以轻松更改它。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM