简体   繁体   English

根据跨多个列的第一个可用非空值连接两个表

[英]Join two tables based on first available non-null value across multiple columns

I have 2 different tables in BigQuery, one detailing a hierarchy for my organization, and the other table containing planning values for different entities.我在 BigQuery 中有 2 个不同的表,一个详细说明了我的组织的层次结构,另一个表包含不同实体的计划值。 Before I explain further, here is how the tables look:在我进一步解释之前,这里是表格的样子:

Table A - Hierarchy This table is defined at a granular level for each warehouse.表 A - 层次结构此表是在每个仓库的粒度级别定义的。 This is essentially a flattened hierarchy (Warehouse -> District -> City -> State -> Country)这本质上是一个扁平化的层次结构(仓库 -> 地区 -> 城市 -> State -> 国家)

Country国家 State State City城市 District Warehouse仓库
C1 C1 S1 S1 CY1 CY1 D1 D1 WH1 WH1
C1 C1 S1 S1 CY1 CY1 D1 D1 WH2 WH2
C1 C1 S1 S1 CY1 CY1 D2 D2 WH3 WH3
C1 C1 S1 S1 CY1 CY1 D2 D2 WH4 WH4
C1 C1 S1 S1 CY2 CY2 D3 D3 WH5 WH5
C1 C1 S1 S1 CY2 CY2 D3 D3 WH6 WH6
... ... ... ... ... ... ... ... ... ...

Here is the other table: Table B - Planned Values这是另一张表:表 B - 计划值

Frequency频率 PeriodStart期间开始 PeriodEnd期末 PlanAmount计划金额 Territory领土
MTD最大传输距离 01/01/2022 01/01/2022 01/31/2022 01/31/2022 500 500 WH1 WH1
YTD年初至今 01/01/2022 01/01/2022 01/31/2022 01/31/2022 790 790 WH1 WH1
... ... ... ... ... ... ... ... ... ...
MTD最大传输距离 12/01/2022 12/01/2022 12/31/2022 12/31/2022 340 340 WH1 WH1
YTD年初至今 12/01/2022 12/01/2022 12/31/2022 12/31/2022 1790 1790 WH1 WH1
MTD最大传输距离 01/01/2022 01/01/2022 01/31/2022 01/31/2022 1500 1500 D1 D1
YTD年初至今 01/01/2022 01/01/2022 01/31/2022 01/31/2022 1800 1800 D1 D1
... ... ... ... ... ... ... ... ... ...
MTD最大传输距离 12/01/2022 12/01/2022 12/31/2022 12/31/2022 1200 1200 D1 D1
YTD年初至今 12/01/2022 12/01/2022 12/31/2022 12/31/2022 6600 6600 D1 D1

I need to join Table A and Table B in the following manner to create a new table ( Table C ):我需要按以下方式加入表 A 和表 B 以创建一个新表(表 C ):

  1. The driving table is Table A.驱动表是Table A。
  2. Table B contains planned values for warehouses, districts, cities etc. in Table B. However, it may contain these planned values defined at any level - sometimes at a warehouse level, and sometimes at only the country level.表 B 包含表 B 中仓库、地区、城市等的计划值。但是,它可能包含在任何级别定义的这些计划值 - 有时在仓库级别,有时仅在国家级别。
  3. For every warehouse in Table A, Table C must have the corresponding plan values from Table B at the most granular level possible .对于表 A 中的每个仓库,表 C 必须具有表 B中尽可能最细粒度的相应计划值。 -- For example, Table B already has plan values for warehouse WH1, but does not have plan values for WH2. -- 例如,表 B 已有仓库 WH1 的计划值,但没有WH2 的计划值。 So, for WH1, Table C shows the plan values as defined within Table B. But for WH2, Table C has to show the district's (D1) plan values instead.因此,对于 WH1,表 C 显示了表 B 中定义的计划值。但是对于 WH2,表 C 必须改为显示学区 (D1) 的计划值。 If the district level value is not available, it has to skip to the next available level (leading all the way to the country level).如果地区级别值不可用,则必须跳到下一个可用级别(一直到国家级别)。

Is anyone able to help me with the logic to create this type of a join?有谁能帮助我了解创建这种类型的连接的逻辑吗?

I am unable to think of the logical way to approach this since I am rather new to SQL. My approach was to create multiple left joins across each level and then use a coalesce, but I fear this will create duplicate values.我想不出解决这个问题的逻辑方法,因为我对 SQL 还很陌生。我的方法是在每个级别创建多个左连接,然后使用合并,但我担心这会创建重复值。

First I extract all dates from column PeriodStart in tableB.首先,我从表 B 的PeriodStart列中提取所有日期。 So there should be for each month a row with values.所以每个月应该有一行值。 If you want to apply a row for several, please split them on a monthly base (unnest).如果您想连续申请多个,请按月拆分(unnest)。 The table A is written for each date in tableB.表 A 是为表 B 中的每个日期编写的。 For each entry in tableB the script will take the largest value per month and territory.对于表 B 中的每个条目,脚本将采用每月和地区的最大值。 If there are for this month are any match between territory and warehouse, the maximum of PlanAmount from these datasets in table B is taken.如果本月在territory和仓库之间存在任何匹配,则采用表 B 中这些数据集中的PlanAmount最大值。 Otherwise ( ifnull ) it is checked for a match between district and territory .否则 ( ifnull ) 检查districtterritory之间的匹配。

with tblA as (select "C1" Country, "S1" State, "CY"|| (1+div(x,4)) City, "D"|| (1+div(x,2)) District, "WH"||x   Warehouse from unnest([1,2,3,4,5,6]) x),
tblB as (Select date("2022-01-01") PeriodStart, 500 PlanAmount, "WH1" Territory
UNION ALL SELECT date("2022-12-01"), 340, "WH1"
UNION ALL SELECT date("2022-12-01"), 1500, "D1"

),
months as (Select * from unnest(generate_date_array(  (Select min(PeriodStart) from tblB), (Select max(PeriodStart) from tblB),interval 1 month ))  as date_month ) ##generate all months in between
,
month_list as (Select distinct PeriodStart as date_month from tblB )

 
SELECT
date_month,country,state,city,District,
ifnull(ifnull(max(WHplan),max(Distplan)),max(Stateplane)) as plan 

from(
Select date_month, tblA.* ,
Wh.PlanAmount as WHplan,
Dist.PlanAmount as Distplan,
State.PlanAmount as Stateplane

from tblA,
#months # generate all months in between OR use:
month_list

left join  tblB WH
on tblA.Warehouse=WH.Territory and date_month=WH.PeriodStart

left join  tblB Dist
on tblA.District=Dist.Territory and date_month=Dist.PeriodStart

left join  tblB State
on tblA.District=State.Territory and date_month=State.PeriodStart
)
group by 1,2,3,4,5

Please tell if your dataset is too large for joins.请告知您的数据集是否太大而无法连接。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 获取多个分区中的下一个(或上一个)非空值 - Get the next (or previous) non-null value in multiple partitioned 在 bigQuery 中仅返回非空键/值 - Returning only non-null key/value in bigQuery FirebaseCloudMessaging: PlatformException (PlatformException(null-error, Host platform returned null value for non-null return value., null, null)) - FirebaseCloudMessaging : PlatformException (PlatformException(null-error, Host platform returned null value for non-null return value., null, null)) 未处理的异常:PlatformException(空错误,主机平台为非空返回值返回 null 值。,null,空) - Unhandled Exception: PlatformException(null-error, Host platform returned null value for non-null return value., null, null) 必须返回非空值,因为返回类型“UserCredentialPlatform”不允许空值 - A non-null value must be returned since the return type 'UserCredentialPlatform' doesn't allow null 错误:必须返回非空值,因为返回类型“从不”不允许 null - Error: A non-null value must be returned since the return type 'Never' doesn't allow null 第一个非 Null 值(有序)聚合 function - First non Null value (ordered) aggregate function select 最后一个非空值和 append 到另一列 BigQuery/PYTHON - select last non-null value and append it to another column BigQuery/PYTHON SQL Redshift:如何在基于另一个列值的连接中使用一个值 - SQL Redshift: How to use a value in a join based on another columns value 错误:必须返回非空值,因为返回类型“Never”不允许 null。Never convertPlatformException(对象异常,StackTrace - Error: A non-null value must be returned since the return type 'Never' doesn't allow null. Never convertPlatformException(Object exception, StackTrace
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM