I have 2 different tables in BigQuery, one detailing a hierarchy for my organization, and the other table containing planning values for different entities. Before I explain further, here is how the tables look:
Table A - Hierarchy This table is defined at a granular level for each warehouse. This is essentially a flattened hierarchy (Warehouse -> District -> City -> State -> Country)
Country | State | City | District | Warehouse |
---|---|---|---|---|
C1 | S1 | CY1 | D1 | WH1 |
C1 | S1 | CY1 | D1 | WH2 |
C1 | S1 | CY1 | D2 | WH3 |
C1 | S1 | CY1 | D2 | WH4 |
C1 | S1 | CY2 | D3 | WH5 |
C1 | S1 | CY2 | D3 | WH6 |
... | ... | ... | ... | ... |
Here is the other table: Table B - Planned Values
Frequency | PeriodStart | PeriodEnd | PlanAmount | Territory |
---|---|---|---|---|
MTD | 01/01/2022 | 01/31/2022 | 500 | WH1 |
YTD | 01/01/2022 | 01/31/2022 | 790 | WH1 |
... | ... | ... | ... | ... |
MTD | 12/01/2022 | 12/31/2022 | 340 | WH1 |
YTD | 12/01/2022 | 12/31/2022 | 1790 | WH1 |
MTD | 01/01/2022 | 01/31/2022 | 1500 | D1 |
YTD | 01/01/2022 | 01/31/2022 | 1800 | D1 |
... | ... | ... | ... | ... |
MTD | 12/01/2022 | 12/31/2022 | 1200 | D1 |
YTD | 12/01/2022 | 12/31/2022 | 6600 | D1 |
I need to join Table A and Table B in the following manner to create a new table ( Table C ):
Is anyone able to help me with the logic to create this type of a join?
I am unable to think of the logical way to approach this since I am rather new to SQL. My approach was to create multiple left joins across each level and then use a coalesce, but I fear this will create duplicate values.
First I extract all dates from column PeriodStart
in tableB. So there should be for each month a row with values. If you want to apply a row for several, please split them on a monthly base (unnest). The table A is written for each date in tableB. For each entry in tableB the script will take the largest value per month and territory. If there are for this month are any match between territory
and warehouse, the maximum of PlanAmount
from these datasets in table B is taken. Otherwise ( ifnull
) it is checked for a match between district
and territory
.
with tblA as (select "C1" Country, "S1" State, "CY"|| (1+div(x,4)) City, "D"|| (1+div(x,2)) District, "WH"||x Warehouse from unnest([1,2,3,4,5,6]) x),
tblB as (Select date("2022-01-01") PeriodStart, 500 PlanAmount, "WH1" Territory
UNION ALL SELECT date("2022-12-01"), 340, "WH1"
UNION ALL SELECT date("2022-12-01"), 1500, "D1"
),
months as (Select * from unnest(generate_date_array( (Select min(PeriodStart) from tblB), (Select max(PeriodStart) from tblB),interval 1 month )) as date_month ) ##generate all months in between
,
month_list as (Select distinct PeriodStart as date_month from tblB )
SELECT
date_month,country,state,city,District,
ifnull(ifnull(max(WHplan),max(Distplan)),max(Stateplane)) as plan
from(
Select date_month, tblA.* ,
Wh.PlanAmount as WHplan,
Dist.PlanAmount as Distplan,
State.PlanAmount as Stateplane
from tblA,
#months # generate all months in between OR use:
month_list
left join tblB WH
on tblA.Warehouse=WH.Territory and date_month=WH.PeriodStart
left join tblB Dist
on tblA.District=Dist.Territory and date_month=Dist.PeriodStart
left join tblB State
on tblA.District=State.Territory and date_month=State.PeriodStart
)
group by 1,2,3,4,5
Please tell if your dataset is too large for joins.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.