简体   繁体   中英

Join two tables based on first available non-null value across multiple columns

I have 2 different tables in BigQuery, one detailing a hierarchy for my organization, and the other table containing planning values for different entities. Before I explain further, here is how the tables look:

Table A - Hierarchy This table is defined at a granular level for each warehouse. This is essentially a flattened hierarchy (Warehouse -> District -> City -> State -> Country)

Country State City District Warehouse
C1 S1 CY1 D1 WH1
C1 S1 CY1 D1 WH2
C1 S1 CY1 D2 WH3
C1 S1 CY1 D2 WH4
C1 S1 CY2 D3 WH5
C1 S1 CY2 D3 WH6
... ... ... ... ...

Here is the other table: Table B - Planned Values

Frequency PeriodStart PeriodEnd PlanAmount Territory
MTD 01/01/2022 01/31/2022 500 WH1
YTD 01/01/2022 01/31/2022 790 WH1
... ... ... ... ...
MTD 12/01/2022 12/31/2022 340 WH1
YTD 12/01/2022 12/31/2022 1790 WH1
MTD 01/01/2022 01/31/2022 1500 D1
YTD 01/01/2022 01/31/2022 1800 D1
... ... ... ... ...
MTD 12/01/2022 12/31/2022 1200 D1
YTD 12/01/2022 12/31/2022 6600 D1

I need to join Table A and Table B in the following manner to create a new table ( Table C ):

  1. The driving table is Table A.
  2. Table B contains planned values for warehouses, districts, cities etc. in Table B. However, it may contain these planned values defined at any level - sometimes at a warehouse level, and sometimes at only the country level.
  3. For every warehouse in Table A, Table C must have the corresponding plan values from Table B at the most granular level possible . -- For example, Table B already has plan values for warehouse WH1, but does not have plan values for WH2. So, for WH1, Table C shows the plan values as defined within Table B. But for WH2, Table C has to show the district's (D1) plan values instead. If the district level value is not available, it has to skip to the next available level (leading all the way to the country level).

Is anyone able to help me with the logic to create this type of a join?

I am unable to think of the logical way to approach this since I am rather new to SQL. My approach was to create multiple left joins across each level and then use a coalesce, but I fear this will create duplicate values.

First I extract all dates from column PeriodStart in tableB. So there should be for each month a row with values. If you want to apply a row for several, please split them on a monthly base (unnest). The table A is written for each date in tableB. For each entry in tableB the script will take the largest value per month and territory. If there are for this month are any match between territory and warehouse, the maximum of PlanAmount from these datasets in table B is taken. Otherwise ( ifnull ) it is checked for a match between district and territory .

with tblA as (select "C1" Country, "S1" State, "CY"|| (1+div(x,4)) City, "D"|| (1+div(x,2)) District, "WH"||x   Warehouse from unnest([1,2,3,4,5,6]) x),
tblB as (Select date("2022-01-01") PeriodStart, 500 PlanAmount, "WH1" Territory
UNION ALL SELECT date("2022-12-01"), 340, "WH1"
UNION ALL SELECT date("2022-12-01"), 1500, "D1"

),
months as (Select * from unnest(generate_date_array(  (Select min(PeriodStart) from tblB), (Select max(PeriodStart) from tblB),interval 1 month ))  as date_month ) ##generate all months in between
,
month_list as (Select distinct PeriodStart as date_month from tblB )

 
SELECT
date_month,country,state,city,District,
ifnull(ifnull(max(WHplan),max(Distplan)),max(Stateplane)) as plan 

from(
Select date_month, tblA.* ,
Wh.PlanAmount as WHplan,
Dist.PlanAmount as Distplan,
State.PlanAmount as Stateplane

from tblA,
#months # generate all months in between OR use:
month_list

left join  tblB WH
on tblA.Warehouse=WH.Territory and date_month=WH.PeriodStart

left join  tblB Dist
on tblA.District=Dist.Territory and date_month=Dist.PeriodStart

left join  tblB State
on tblA.District=State.Territory and date_month=State.PeriodStart
)
group by 1,2,3,4,5

Please tell if your dataset is too large for joins.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM