简体   繁体   English

识别配置单元表中缺少的分区

[英]Identify missing partitions in a hive table

My table runs daily and generates a partition cloumn called date on each day 我的表每天运行,并每天生成一个称为date的分区簇

For example: My query generates dates 例如:我的查询生成日期

2018-01-01
2018-01-02
2018-01-03
2018-01-06
2018-01-08

2018-01-05 & 2018-01-07 dates are missing. 2018-01-05 & 2018-01-07日期丢失了。 Is there any way to identify those missing dates? 有什么方法可以识别那些遗漏的日期吗?

Below queries will 1) create a temp table with sequential dates from start partition date to latest partition date 2) do a left join and see which partition dates are missing (partition_dt is null). 下面的查询将1)创建一个具有从开始分区日期到最新分区日期的连续日期的临时表2)进行左连接,并查看缺少哪些分区日期(partition_dt为null)。 Hope this helps. 希望这可以帮助。 Thanks. 谢谢。

create table partition_dtes as 
with cal_date as (select min(partition_dt) as min_dt, max(partition_dt) as max_dt from mytable) 
select date_add(t.min_dt, pe.idx) as series_dte
from  cal_date t
lateral view
posexplode(split(space(datediff(t.max_dt,t.min_dt)),' ')) pe as idx, dte; 
Result: 
2018-01-01
2018-01-02
2018-01-03
2018-01-04
2018-01-05
2018-01-06
2018-01-07
2018-01-08

select distinct dte.series_dte
from partition_dtes dte
left join mytable  tbl
on dte.series_dte=tbl.partition_dt
where tbl.partition_dt is null
order by dte.series_dte;

Result:
   2018-01-04
   2018-01-05
   2018-01-07

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM