简体   繁体   中英

Self join table on previous month data to add missing records

I am trying to join Impala table with previous month data to check missing records in current month. I have Employee records in source table. If an employee is not present in current month but was there in previous month, then need to label this employee as "Terminated"

Tried to do the left outer join with date condition and employee name but it is not returning the missing record.

Employee current month equals employee previous month

Current reporting month equals previous reporting month

Input Data:

+---------+---------+-----------+----------------+
|employee | branch  | hire_date | reporting_month|
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190131     |
+---------+---------+-----------+----------------+
| Judy    | GIP     | 20181014  |   20190131     |
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190228     |
+---------+---------+-----------+----------------+
| Judy    | GIP     | 20181014  |   20190228     |
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190331     |
+---------+---------+-----------+----------------+
| Judy    | GIP     | 20181014  |   20190331     |
+---------+---------+-----------+----------------+
| James   | EE      | 20170101  |   20190430     |
+---------+---------+-----------+----------------+
| Max     | EEI     | 20170201  |   20190430     |
+---------+---------+-----------+----------------+

Suppose current reporting month is 20190430 and employee Judy is not present then record needs to be added for Judy with term flag as "Terminated"

Expected Output:

+---------+---------+-----------+----------------+-----------+
|employee | branch  | hire_date | reporting_month| Term_flag |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190131     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190131     | NULL      |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190228     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190228     | NULL      |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190331     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190331     | NULL      |
+---------+---------+-----------+----------------+-----------+
| James   | EE      | 20170101  |   20190430     | NULL      |
+---------+---------+-----------+----------------+-----------+
| Judy    | GIP     | 20181014  |   20190430     |Terminated | 
+---------+---------+-----------+----------------+-----------+
| Max     | EEI     | 20170201  |   20190430     | NULL      |
+---------+---------+-----------+----------------+-----------+

I'm not sure where the magic date for 20190430 comes from. The basic idea is union all as follows:

select employee, branch, hire_date, reporting_month, null as term_flag
from input
union all
select employee, branch, hire_date, 20190430 as reporting_month, 'terminated'
from (select i.*,
             row_number() over (order by reporting_month desc) as seqnum
      from input i
     ) i
where seqnum = 1 and
      months_add(trunc(reporting_month, 'MON') , 1) < '2019-04-01';

The month arithmetic may be a little tricky, because your date is the last day of the month rather than the first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM