简体   繁体   English

如何通过 BigQuery 中的依赖匹配键连接两个表?

[英]How to join two tables by dependent match keys in BigQuery?

I have two tables in BigQuery First one is a list of rates.我在 BigQuery 中有两个表第一个是费率列表。 Rates have default values with source equal -1 for each combo code - offer .对于每个组合code - offer费率具有默认值,其中source等于-1 Apart from combo code - offer , some rates have specified source除组合code - offer ,部分价格已指定source

Second table has same columns as first table except rates + any other data.第二个表与第一个表具有相同的列,除了比率 + 任何其他数据。

My goal join rates by matched code - offer - source otherwise use default rate by matched code - offer with source equal -1我的目标通过匹配的连接率code - offer - source以其它方式使用通过匹配的违约率code - offersource等于-1

In example query returns default rates only:在示例查询中仅返回默认费率:

    WITH t1 AS (SELECT 21 as source, 'SA' as code, 'offer1' as offer, 2.4 as rate 
            UNION ALL
            SELECT 33, 'SA', 'offer1', 2.5
            UNION ALL
            SELECT 39, 'SA', 'offer1', 2.1
            UNION ALL
            SELECT -1, 'SA', 'offer1', 3
            UNION ALL
            SELECT -1, 'SA', 'offer2', 4
            UNION ALL
            SELECT 47, 'YN', 'offer1', 2.7
            UNION ALL
            SELECT -1, 'YN', 'offer1', 5.4
            UNION ALL
            SELECT -1, 'YN', 'offer2', 0.9
            UNION ALL
            SELECT -1, 'RE', 'offer1', 5.7
            UNION ALL
            SELECT -1, 'RE', 'offer2', 3.4),
t2 as (SELECT 21 as source, 'SA' as code, 'offer1' as offer, "any data" as other_columns
        UNION ALL SELECT 21, 'SA', 'offer1', "any data"
        UNION ALL SELECT 21, 'SA', 'offer1', "any data"
        UNION ALL SELECT 21, 'SA', 'offer2', "any data"
        UNION ALL SELECT 47, 'YN', 'offer1', "any data"
        UNION ALL SELECT 47, 'YN', 'offer2', "any data"
        UNION ALL SELECT 50, 'YN', 'offer1', "any data"
        UNION ALL SELECT 47, 'YN', 'offer2', "any data"
        UNION ALL SELECT 78, 'RE', 'offer1', "any data"
        UNION ALL SELECT 66, 'RE', 'offer2', "any data")
        
        
SELECT t2.*, rate FROM t2
LEFT JOIN t1 ON t1.offer = t2.offer AND t1.code = t2.code AND IF (t1.source = t1.source AND rate IS NULL, t1.source = t2.source, t1.source = - 1)

Next query returns rates with specified source and null when source did not matchsource不匹配时,下一个查询返回具有指定sourcenull费率

SELECT t2.*, rate FROM t2
    LEFT JOIN t1 ON t1.offer = t2.offer AND t1.code = t2.code AND IF (t1.source = t1.source AND rate IS NOT NULL, t1.source = t2.source, t1.source = - 1) 

How can I join rates correct?我怎样才能加入正确的费率?

You can left join twice and use conditional logic:您可以left join两次并使用条件逻辑:

select t2.*, coalesce(t11.rate, t12.rate) rate
from t2
left join t1 t11
    on  t11.code = t2.code
    and t11.offer = t2.offer 
    and t11.source = t2.source
left join t1 t12 
    on  t12.code = t2.code
    and t12.offer = t2.offer
    and t12.source = -1
    and t11.code is null

Below is for BigQuery Standard SQL下面是 BigQuery 标准 SQL

#standardSQL
select any_value(t2).*, 
  array_agg(rate order by t1.source = t2.source desc, t1.source = -1 desc limit 1)[offset(0)] rate
from t2
left join t1 
on  t1.code = t2.code
and t1.offer = t2.offer 
group by format('%t', t2)   

if applied to sample data from your question - output is as below如果应用于您问题中的样本数据 - 输出如下

在此处输入图片说明

Above avoids double joining, the only side effect here is - result is deduped - meaning duplicate rows - which are present in the table 2 - are deduped / eliminated以上避免了双重连接,这里唯一的副作用是 - 结果被删除 - 意味着重复行 - 存在于表 2 中 - 被删除/删除

I need duplicate rows我需要重复的行

Sure, just almost no changes to above gives you all rows当然,对上面几乎没有任何变化为您提供所有行

#standardSQL
select any_value(t2).*, 
  array_agg(rate order by t1.source = t2.source desc, t1.source = -1 desc limit 1)[offset(0)] rate
from t2, unnest([rand()]) as r 
left join t1 
on  t1.code = t2.code
and t1.offer = t2.offer 
group by format('%t', t2), r   

with output带输出

在此处输入图片说明

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM