[英]Alternative to self-join
我有一个表格供应网络,其中包括四列:
客户ID,供应商ID,供应商_产品ID,采购年
。
我想构建一个客户对,其中两个客户都需要在同一个年度内从同一供应商处购买相同的产品。 我在BigQuery
使用self-join
来完成此操作,但这太慢了。 还有其他选择吗?
select distinct
a.CustomerID as focal_CustomerID,
b.CustomerID as linked_CustomerID,
a.Purchase_Year,
a.Supplier_productID
from
supplynetwork as a,
supplynetwork as b
where
a.CustomerID<>b.CustomerID and
a.Purchase_Year=b.Purchase_Year and
a.Supplier_productID=b.Supplier_productID and
a.SupplierID=b.SupplierID
使用联接语法并为CustomerID列编制索引
select distinct
a.CustomerID as focal_CustomerID,
b.CustomerID as linked_CustomerID,
a.Purchase_Year,
a.Supplier_productID
from
supplynetwork as a join
supplynetwork as b
on
a.Purchase_Year=b.Purchase_Year and
a.Supplier_productID=b.Supplier_productID and
a.SupplierID=b.SupplierID
where a.CustomerID<>b.CustomerID
您可以使用聚合在一行中获取所有满足条件的客户:
select Purchase_Year, Supplier_productID, SupplierID,
array_agg(distinct CustomerID) as customers
from supplynetwork sn
group by Purchase_Year, Supplier_productID, SupplierID;
然后,您可以使用数组操作获取对:
with pss as (
select Purchase_Year, Supplier_productID, SupplierID,
array_agg(distinct CustomerID) as customers
from supplynetwork sn
group by Purchase_Year, Supplier_productID, SupplierID
)
select c1, c2, pss.*
from pss cross join
unnest(pss.customers) c1 cross join
unnest(pss.customers) c2
where c1 < c2;
您可以使用CROSS JOIN
,它(即使是笛卡尔的)也可以使您受益于简单性。 请在下面尝试以下查询,看看它是否比您的基准便宜:
select
focal_CustomerID,
linked_CustomerID,
Purchase_Year,
Supplier_ProductID
from (
select
SupplierID,
Supplier_ProductID,
Purchase_Year,
array_agg(distinct CustomerID) as Customers
from `mydataset.mytable`
group by 1,2,3
), unnest(Customers) focal_CustomerID
cross join unnest(Customers) linked_CustomerID
where focal_CustomerID != linked_CustomerID
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.