[英]Join two tables based on first available non-null value across multiple columns
I have 2 different tables in BigQuery, one detailing a hierarchy for my organization, and the other table containing planning values for different entities.我在 BigQuery 中有 2 个不同的表,一个详细说明了我的组织的层次结构,另一个表包含不同实体的计划值。 Before I explain further, here is how the tables look:
在我进一步解释之前,这里是表格的样子:
Table A - Hierarchy This table is defined at a granular level for each warehouse.表 A - 层次结构此表是在每个仓库的粒度级别定义的。 This is essentially a flattened hierarchy (Warehouse -> District -> City -> State -> Country)
这本质上是一个扁平化的层次结构(仓库 -> 地区 -> 城市 -> State -> 国家)
Country![]() |
State ![]() |
City![]() |
District![]() |
Warehouse![]() |
---|---|---|---|---|
C1 ![]() |
S1 ![]() |
CY1 ![]() |
D1 ![]() |
WH1 ![]() |
C1 ![]() |
S1 ![]() |
CY1 ![]() |
D1 ![]() |
WH2 ![]() |
C1 ![]() |
S1 ![]() |
CY1 ![]() |
D2 ![]() |
WH3 ![]() |
C1 ![]() |
S1 ![]() |
CY1 ![]() |
D2 ![]() |
WH4 ![]() |
C1 ![]() |
S1 ![]() |
CY2 ![]() |
D3 ![]() |
WH5 ![]() |
C1 ![]() |
S1 ![]() |
CY2 ![]() |
D3 ![]() |
WH6 ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
Here is the other table: Table B - Planned Values这是另一张表:表 B - 计划值
Frequency![]() |
PeriodStart![]() |
PeriodEnd![]() |
PlanAmount![]() |
Territory![]() |
---|---|---|---|---|
MTD![]() |
01/01/2022 ![]() |
01/31/2022 ![]() |
500 ![]() |
WH1 ![]() |
YTD![]() |
01/01/2022 ![]() |
01/31/2022 ![]() |
790 ![]() |
WH1 ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
MTD![]() |
12/01/2022 ![]() |
12/31/2022 ![]() |
340 ![]() |
WH1 ![]() |
YTD![]() |
12/01/2022 ![]() |
12/31/2022 ![]() |
1790 ![]() |
WH1 ![]() |
MTD![]() |
01/01/2022 ![]() |
01/31/2022 ![]() |
1500 ![]() |
D1 ![]() |
YTD![]() |
01/01/2022 ![]() |
01/31/2022 ![]() |
1800 ![]() |
D1 ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
... ![]() |
MTD![]() |
12/01/2022 ![]() |
12/31/2022 ![]() |
1200 ![]() |
D1 ![]() |
YTD![]() |
12/01/2022 ![]() |
12/31/2022 ![]() |
6600 ![]() |
D1 ![]() |
I need to join Table A and Table B in the following manner to create a new table ( Table C ):我需要按以下方式加入表 A 和表 B 以创建一个新表(表 C ):
Is anyone able to help me with the logic to create this type of a join?有谁能帮助我了解创建这种类型的连接的逻辑吗?
I am unable to think of the logical way to approach this since I am rather new to SQL. My approach was to create multiple left joins across each level and then use a coalesce, but I fear this will create duplicate values.我想不出解决这个问题的逻辑方法,因为我对 SQL 还很陌生。我的方法是在每个级别创建多个左连接,然后使用合并,但我担心这会创建重复值。
First I extract all dates from column PeriodStart
in tableB.首先,我从表 B 的
PeriodStart
列中提取所有日期。 So there should be for each month a row with values.所以每个月应该有一行值。 If you want to apply a row for several, please split them on a monthly base (unnest).
如果您想连续申请多个,请按月拆分(unnest)。 The table A is written for each date in tableB.
表 A 是为表 B 中的每个日期编写的。 For each entry in tableB the script will take the largest value per month and territory.
对于表 B 中的每个条目,脚本将采用每月和地区的最大值。 If there are for this month are any match between
territory
and warehouse, the maximum of PlanAmount
from these datasets in table B is taken.如果本月在
territory
和仓库之间存在任何匹配,则采用表 B 中这些数据集中的PlanAmount
最大值。 Otherwise ( ifnull
) it is checked for a match between district
and territory
.否则 (
ifnull
) 检查district
和territory
之间的匹配。
with tblA as (select "C1" Country, "S1" State, "CY"|| (1+div(x,4)) City, "D"|| (1+div(x,2)) District, "WH"||x Warehouse from unnest([1,2,3,4,5,6]) x),
tblB as (Select date("2022-01-01") PeriodStart, 500 PlanAmount, "WH1" Territory
UNION ALL SELECT date("2022-12-01"), 340, "WH1"
UNION ALL SELECT date("2022-12-01"), 1500, "D1"
),
months as (Select * from unnest(generate_date_array( (Select min(PeriodStart) from tblB), (Select max(PeriodStart) from tblB),interval 1 month )) as date_month ) ##generate all months in between
,
month_list as (Select distinct PeriodStart as date_month from tblB )
SELECT
date_month,country,state,city,District,
ifnull(ifnull(max(WHplan),max(Distplan)),max(Stateplane)) as plan
from(
Select date_month, tblA.* ,
Wh.PlanAmount as WHplan,
Dist.PlanAmount as Distplan,
State.PlanAmount as Stateplane
from tblA,
#months # generate all months in between OR use:
month_list
left join tblB WH
on tblA.Warehouse=WH.Territory and date_month=WH.PeriodStart
left join tblB Dist
on tblA.District=Dist.Territory and date_month=Dist.PeriodStart
left join tblB State
on tblA.District=State.Territory and date_month=State.PeriodStart
)
group by 1,2,3,4,5
Please tell if your dataset is too large for joins.请告知您的数据集是否太大而无法连接。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.