简体   繁体   English

BigQuery - 使用字符串和数组连接两个表

[英]BigQuery - JOIN on two tables using string and array

We have two tables in BigQuery like below:我们在 BigQuery 中有两个表,如下所示:

Table A表A

 Name | Question  | Answer
 -----+-----------+-------
 Bob  | Interest  | a
 Bob  | Interest  | b
 Sue  | Interest  | a
 Sue  | Interest  | c
 Joe  | Interest  | a
 Joe  | Interest  | b
 Joe  | Interest  | c
 Joe  | Interest  | d

Table B (Static)表 B(静态)

           Interests                        |   Segment
--------------------------------------------+------------------
["a"]                                       |   S1
["a","b"]                                   |   S2 
["a", "b", "c", "d"]                        |   S3

Expected table预期表

 User | Question  | Answer
 -----+-----------+-------
 Bob  | Interest  | a
 Bob  | Interest  | b
 Sue  | Interest  | a
 Sue  | Interest  | c
 Joe  | Interest  | a
 Joe  | Interest  | b
 Joe  | Interest  | c
 Joe  | Interest  | d
          (+)
 Bob  | Segment   | S1
 Bob  | Segment   | S2
 Sue  | Segment   | S1
 Joe  | Segment   | S1
 Joe  | Segment   | S2
 Joe  | Segment   | S3 

In the above tables, the Answer field is of string type and Interests is of array type.在上表中,Answer 字段是字符串类型,而 Interests 是数组类型。

Pointers:指针:

  1. One user can have one or more interests.一个用户可以有一个或多个兴趣。
  2. One or more interests can belong to one segment.一个或多个兴趣可以属于一个细分市场。
  3. A user will be assigned to a segment only when all of his interests are matched.只有当用户的所有兴趣都匹配时,他才会被分配到一个细分市场。

Any inputs/thoughts in the right direction would be greatly appreciated.任何在正确方向上的输入/想法将不胜感激。

Below is for BigQuery Standard SQL以下是 BigQuery 标准 SQL

#standardSQL
select name, question, answer from `project.dataset.tableA`
union all
select name, 'Segment', segment
from (
  select 
    name, 'Segment', segment,
    ( select countif(y is null)
      from b.interest x
      left join a.answers y
      on x = y
    ) = 0 qualified
  from (
    select name, array_agg(answer) answers
    from `project.dataset.tableA`
    group by name
  ) a, `project.dataset.tableB` b
)
where qualified    

if to apply to sample data from your question - output is如果适用于您问题中的样本数据 - 输出是

在此处输入图片说明

This looks like a union all -- where the second query unnests the interests and joins:这看起来像一个union all all——其中第二个查询取消了兴趣并加入:

select a.name, a.question, a.answer
from a
union all
select a.name, 'segment', min(b.segment)
from a join
     (b cross join
      unnest(interests) b_interest
     )
     on a.interest = b_interest
group by name
having min(b.segment) = max(b.segment);

You might need a left join , if some interests don't have segments.如果某些兴趣没有细分,您可能需要一个left join

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM