简体   繁体   中英

Alternative to self-join

I have a table- supplynetwork including four columns:

CustomerID, SupplierID, Supplier_productID, Purchase_Year

.

I want to construct a customer-pair where both customers purchase same product from the same supplier in a focal year. I use the self-join to do this in BigQuery .But it is too slow. Any alternative?

select distinct
  a.CustomerID as focal_CustomerID,
  b.CustomerID as linked_CustomerID,
  a.Purchase_Year,
  a.Supplier_productID
from 
  supplynetwork as a,
  supplynetwork as b
where 
  a.CustomerID<>b.CustomerID and
  a.Purchase_Year=b.Purchase_Year and
  a.Supplier_productID=b.Supplier_productID and
  a.SupplierID=b.SupplierID

use join syntax and do index CustomerID column

select distinct
  a.CustomerID as focal_CustomerID,
  b.CustomerID as linked_CustomerID,
  a.Purchase_Year,
  a.Supplier_productID
from 
  supplynetwork as a join
  supplynetwork as b
  on   
  a.Purchase_Year=b.Purchase_Year and
  a.Supplier_productID=b.Supplier_productID and
  a.SupplierID=b.SupplierID
  where a.CustomerID<>b.CustomerID 

You can use aggregation to get all customers that meet the conditions in a single row:

select Purchase_Year, Supplier_productID, SupplierID,
       array_agg(distinct CustomerID) as customers
from supplynetwork sn
group by Purchase_Year, Supplier_productID, SupplierID;

You can then get pairs using array operations:

with pss as (
      select Purchase_Year, Supplier_productID, SupplierID,
             array_agg(distinct CustomerID) as customers
      from supplynetwork sn
      group by Purchase_Year, Supplier_productID, SupplierID
     )
select c1, c2, pss.*
from pss cross join
     unnest(pss.customers) c1 cross join
     unnest(pss.customers) c2
where c1 < c2;

You can use CROSS JOIN , which (even though does a cartesian) can probably give you a benefit of simplicity. Try this query below and see if it's cheaper than your baseline:

select 
   focal_CustomerID, 
   linked_CustomerID, 
   Purchase_Year, 
   Supplier_ProductID 
from (
  select 
     SupplierID, 
     Supplier_ProductID, 
     Purchase_Year, 
     array_agg(distinct CustomerID) as Customers
  from `mydataset.mytable`
  group by 1,2,3
), unnest(Customers) focal_CustomerID
cross join unnest(Customers) linked_CustomerID
where focal_CustomerID != linked_CustomerID

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM