繁体   English   中英

Hive逻辑获取最小时间,最大时间和其他列

[英]Hive logic to get min time, max time and other columns

我有格式的数据

+---------------------+-------------------------+-------------------------+-----------+------+
|         id          |       start time        |        end time         | direction | name |
+---------------------+-------------------------+-------------------------+-----------+------+
| 9202340753368000000 | 2015-06-02 15:10:28.677 | 2015-06-02 15:32:22.677 |         3 | xyz  |
| 9202340753368000000 | 2015-06-02 14:55:37.353 | 2015-06-02 15:12:18.84  |         1 | xyz  |
+---------------------+-------------------------+-------------------------+-----------+------+

我需要输出像最小开始时间,最大结束时间,最小开始时间的方向值和名称

+---------------------+-------------------------+------------------------+-----------+------+
|         id          |       start time        |        end time        | direction | name |
+---------------------+-------------------------+------------------------+-----------+------+
| 9202340753368000000 | 2015-06-02 14:55:37.353 | 2015-06-02 15:32:22.677|         1 | xyz  |
+---------------------+-------------------------+------------------------+-----------+------+

我尝试使用

select x.id, min(x.start_time) as mintime, max(x.end_time) maxtime , y.direction, y.name   
 from dir_samp x inner join ( 
 select id, start_time,  end_time, name, direction ,  
   rank() over ( partition by id
                order by start_time asc) as r 
   from dir_samp 
) y  on x.id = y.id  where y.r = 1 group by x.id , y.direction, y.name

是否还有其他更有效的逻辑? 请提供。

谢谢

您不需要内部联接:

select y.id, min(y.start_time) as mintime, 
       max(y.end_time) maxtime , 
       max(case when y.r=1 then y.direction end) as direction, 
       max(case when y.r=1 then y.name end) as name 
from
( 
 select id, start_time,  end_time, name, direction ,  
   rank() over ( partition by id order by start_time asc) as r 
   from dir_samp 
) y 
group by y.id;
select      id
           ,min_vals.start_time
           ,end_time
           ,min_vals.direction
           ,min_vals.name

from       (select      id  
                       ,min(named_struct('start_time',start_time,'direction',direction,'name',name)) as min_vals
                       ,max(end_time)                                                                as end_time

            from        dir_samp

            group by    id
            ) t
;

+---------------------+----------------------------+----------------------------+-----------+------+
| id                  | start_time                 | end_time                   | direction | name |
+---------------------+----------------------------+----------------------------+-----------+------+
| 9202340753368000000 | 2015-06-02 14:55:37.353000 | 2015-06-02 15:32:22.677000 | 1         | xyz  |
+---------------------+----------------------------+----------------------------+-----------+------+

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM