简体   繁体   English

BigQuery:以 UNION 作为 ARRAY 的子查询

[英]BigQuery: Subquery with UNION as ARRAY

I have the following two example tables我有以下两个示例表

Orders Table订单表

order_id order_id linked_order1链接订单1 linked_order2链接订单2
1001 1001 L005 L005 null null
1002 1002 null null null null
1003 1003 L006 L006 L007 L007

Invoices Table发票表

order_id order_id linked_order_id linked_order_id charge收费
1001 1001 null null 4.27 4.27
1002 1002 null null 9.82 9.82
1003 1003 null null 7.42 7.42
null null L005 L005 2.12 2.12
null null L006 L006 1.76 1.76
null null L007 L007 3.20 3.20

I need to join these so the charges of all the orders (linked and otherwise) can be shown as part of the single order row.我需要加入这些,以便所有订单(链接和其他)的费用可以显示为单个订单行的一部分。 My desired output is something like this.我想要的 output 是这样的。

Desired Output所需 Output

order_id order_id linked_order1链接订单1 linked_order2链接订单2 invoices.charge invoices.charge invoices.order_id invoices.order_id invoices.linked_order_id invoices.linked_order_id
1001 1001 L005 L005 null null 4.27 4.27 1001 1001 null null
2.12 2.12 null null L005 L005
1002 1002 null null null null 9.82 9.82 null null null null
1003 1003 L006 L006 L007 L007 7.42 7.42 null null null null
1.76 1.76 null null L006 L006
3.20 3.20 null null L007 L007

I can manage to get the main order into the table as follows.我可以设法将主要订单放入表中,如下所示。

SELECT
  orders,
  ARRAY(
  SELECT AS STRUCT * FROM `invoices_table` WHERE order=orders.order_id) AS invoice
FROM
  `orders_table` AS orders

I can run a separate query to union all of the invoice results into a single table for given order ids but I can't combine this with the above query with out getting errors.对于给定的订单 ID,我可以运行一个单独的查询将所有发票结果合并到一个表中,但我不能将它与上述查询结合起来而不会出错。

Something like this...像这样的东西...

SELECT
  orders,
  ARRAY(
  SELECT AS STRUCT * FROM 
(SELECT * FROM `invoices_table` WHERE order=orders.order_id
    UNION ALL SELECT * FROM `invoices_table` WHERE linked_order_id=orders.linked_order1
    UNION ALL SELECT * FROM `invoices_table` WHERE linked_order_id=orders.linked_order2)
) AS invoice
FROM
  `orders_table` AS orders

But this gives me the correlated subqueries error.但这给了我相关的子查询错误。

[Update] [更新]

This is much simpler than I thought.这比我想象的要简单得多。 The following query gives me what I was after.以下查询给出了我所追求的。

SELECT
  orders,
  ARRAY(
  SELECT AS STRUCT * FROM `invoices_table` WHERE order=orders.order_id OR linked_order_id IN(orders.linked_order1, orders.linked_order2)) AS invoice
FROM
  `orders_table` AS orders

Using CROSS JOINS,使用交叉连接,

SELECT o.*, ARRAY_AGG(i) invoices
  FROM Orders o, Invoices i 
 WHERE o.order_id = i.order_id 
    OR i.linked_order_id IN (o.linked_order1, o.linked_order2)
 GROUP BY 1, 2, 3;
Query results查询结果

在此处输入图像描述

[UPDATE] [更新]

Sometimes the query using OR conditions in WHERE clause might show poor perfomrance in large dataset.有时,在 WHERE 子句中使用 OR 条件的查询可能会在大型数据集中显示较差的性能。 In that case you may try below query instead that generates same result.在这种情况下,您可以尝试下面的查询,而不是生成相同的结果。

SELECT o.*, ARRAY_AGG(i) invoices FROM (
  SELECT o, i FROM Orders o JOIN Invoices i USING (order_id)
   UNION ALL
  SELECT o, i FROM Orders o JOIN Invoices i ON i.linked_order_id IN (o.linked_order1, o.linked_order2)
) GROUP BY 1, 2, 3;

For the desired output table, the full outer join is the right command.对于所需的 output 表, full outer join是正确的命令。

with tblA as (Select order_id, 1 linked_order1, 2   linked_order2, from unnest([1,2,3]) order_id),
tblB as (Select order_id, 109.99 charge from unnest([3,4,5]) order_id
union all select null  order_id, * from unnest([50.1,29.99]) charge
)

Select *  
  from tblA
  full join tblB
  using(order_id)

For your setting, there is the need to have several joining conditions.对于您的设置,需要有几个加入条件。 Therefore, the first table is used three times, for each joining key.因此,对于每个连接键,第一个表被使用了 3 次。

with tblA as (Select order_id, "L05" linked_order1, "L2"    linked_order2, from unnest(["1","2","3"]) order_id),
tblB as (Select order_id, null linked_order_id, 109.99 charge from unnest(["3","4","5"]) order_id
union all select null  order_id, "L05" , * from unnest([50.1,29.99]) charge
)

Select A.order_id,linked_order1,linked_order2, array_agg(struct(tblB.order_id,linked_order_id,charge))
  from 
  (
    Select * from tblA, unnest([order_id,linked_order1,linked_order2]) as tmp_id
  ) A
  full join tblB
  on tmp_id = ifnull(tblB.order_id,linked_order_id)
  where charge is not null #or tmp_id=A.order_id
  group by 1,2,3

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM