简体   繁体   English

雪花条件 Window Function

[英]Snowflake Conditions in Window Function

I am trying to write a window function to help me retrieve a single field from a second table that I need to join to my existing table.我正在尝试编写一个 window function 来帮助我从第二个表中检索一个字段,我需要将其连接到我现有的表中。 The issue is the only way to figure out which value out of many possible values is the correct one requires matching two IDs and then out of those options (options where IDs match) pulling the most recent one that is not before a date (which is different for each record and pulled from the initial table).问题是从许多可能的值中找出哪个值是正确的唯一方法需要匹配两个 ID,然后从这些选项(ID 匹配的选项)中提取不在日期之前的最近的一个(这是每条记录都不同,并从初始表中提取)。

Right now I have written:现在我已经写了:

Select distinct primary_id,
       first_value(desired_column) over partition by id_1, id_2, order by date desc)
From base_table
Left join second_table 
on second_table.id_1 = base_table.id_1 and 
   second_table.date <= base_table.date

However, this is still returning incorrect values.但是,这仍然返回不正确的值。 The returned table should have the same row count as base_table, but with the desired_column added based on whichever record matches ids but also happens before the base_table date (each desired_column value should be one result, the most recent one before the base_table date that matches the ids).返回的表应具有与 base_table 相同的行数,但根据与 ids 匹配的记录添加 desired_column,但也发生在 base_table 日期之前(每个 desired_column 值应该是一个结果,base_table 日期之前与ID)。 This has the same row count, but it's returning desired_column values that are completely incorrect (I suspect that it is because I don't break down the second date <= base in the window function directly, but that isn't possible? I'm not sure how to proceed.)这具有相同的行数,但它返回了完全不正确的 desired_column 值(我怀疑这是因为我没有直接分解 window function 中的第二个日期 <= base,但这是不可能的?我'我不确定如何进行。)

Thank you in advance.先感谢您。

Edit to add:编辑添加:

Sample Base Table示例基表

Primary Key首要的关键 ID1 ID1 ID2 ID2 Date日期
1 1个 123 123 321 321 01/22/2021 2021 年 1 月 22 日
2 2个 123 123 654 654 09/02/2022 09/02/2022
3 3个 234 234 432 432 02/02/2019 02/02/2019

Sample Second Table样本第二表

Desired_Column Desired_Column ID1 ID1 ID2 ID2 Date日期
q q 123 123 321 321 01/21/2021 01/21/2021
r r 123 123 654 654 09/03/2022 09/03/2022
w w 234 234 432 432 02/01/2019 02/01/2019
s 234 234 432 432 03/20/2022 03/20/2022
a一种 123 123 439 439 02/20/2022 02/20/2022
w w 999 999 999 999 09/10/2022 2022 年 9 月 10 日
null null 234 234 987 987 10/10/2020 10/10/2020

Desired Output所需 Output

Primary Key首要的关键 ID1 ID1 ID2 ID2 Date日期 Desired_Column Desired_Column
1 1个 123 123 321 321 01/22/2021 2021 年 1 月 22 日 q q
2 2个 123 123 654 654 09/02/2022 09/02/2022 null null
3 3个 234 234 432 432 02/02/2019 02/02/2019 w w

so making some CTE's for the data:所以为数据制作一些 CTE:

with base_table(primary_id, ID_1, ID_2, Date) as (
    select * from values
    (1, 123, 321, '01/22/2021'::date),
    (2, 123, 654, '09/02/2022'::date),
    (3, 234, 432, '02/02/2019'::date)
), second_table(Desired_Column, ID_1, ID_2, Date) as (
    select * from values
    ('q'    ,123, 321, '01/21/2021'::date),
    ('r'    ,123, 654, '09/03/2022'::date),
    ('w'    ,234, 432, '02/01/2019'::date),
    ('s'    ,234, 432, '03/20/2022'::date),
    ('a'    ,123, 439, '02/20/2022'::date),
    ('w'    ,999, 999, '09/10/2022'::date),
    (null   ,234, 987, '10/10/2020'::date)
)

and then correcting your SQL:然后更正您的 SQL:

Select distinct b.primary_id,
       first_value(s.desired_column) over (partition by b.id_1, b.id_2 order by s.date desc)
From base_table as b
Left join second_table as s
on s.id_1 = b.id_1 and 
   s.date <= b.date

gives:给出:

PRIMARY_ID主 ID FIRST_VALUE(S.DESIRED_COLUMN) OVER (PARTITION BY B.ID_1, B.ID_2 ORDER BY S.DATE DESC) FIRST_VALUE(S.DESIRED_COLUMN) OVER (PARTITION BY B.ID_1, B.ID_2 ORDER BY S.DATE DESC)
1 1个 q q
2 2个 a一种
3 3个 w w

but the distinct is the hint, this is not the method you are looking for...但不同的是提示,这不是您正在寻找的方法......

dropping the FIRST_VALUE which get a result for very row, and using a QUALIFY and ROW_NUMBER to RANK the rows, just keep the best (aka 1)删除获得非常行结果的FIRST_VALUE ,并使用QUALIFYROW_NUMBER对行进行排名,只保留最好的(又名 1)

Select b.primary_id,
       s.desired_column
From base_table as b
Left join second_table as s
on s.id_1 = b.id_1 and 
   s.date <= b.date
qualify row_number() over (partition by b.id_1, b.id_2 order by s.date desc) = 1

gives:给出:

PRIMARY_ID主 ID DESIRED_COLUMN DESIRED_COLUMN
1 1个 q q
2 2个 a一种
3 3个 w w

but also allow accessing all the other values from the two tables:但也允许访问两个表中的所有其他值:

Select b.*,
       s.*   
From base_table as b
Left join second_table as s
on s.id_1 = b.id_1 and 
   s.date <= b.date
qualify row_number() over (partition by b.id_1, b.id_2 order by s.date desc) = 1

And given you want to match "both ID's" you should use this SQL:如果你想匹配“两个 ID”,你应该使用这个 SQL:

Select b.primary_id,
    b.id_1,
    b.id_2,
    s.desired_column 
From base_table as b
Left join second_table as s
on s.id_1 = b.id_1 and 
   s.id_2 = b.id_2 and 
   s.date <= b.date
qualify row_number() over (partition by b.id_1, b.id_2 order by s.date desc) = 1

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM