简体   繁体   中英

How to find a social network by BigQuery (SQL)?

There is a fact table about shopping with below attributes.

CustomerId ProductId
C1 P1
C1 P3
C2 P2
C2 P3
C3 P4
C4 P2
C5 P4
C5 P6
C6 P6

It describes which Customer purchased what product.

Now, I want to build a cluster with same interest.

For e.g.


C1 bought P1 & P3.
P3 bought by C2 as well, so C1 & C2 have common interest because both bought P3. 

Now C2 also bought P2 and P2 bought by C4 as well. 
So C2 and C4 also have common interest because both bought P2. 

Thus, C1 is connected to C2 and C2 connect to C4. 
hence C1, C2 & C4 all together forms a network.

I want an output like this, Where NetworkId should be an unique id for each network.

NetworkId CustomerId
N1 C1
N1 C2
N1 C4
N2 C3
N2 C5
N2 C6

It seems like a graph problem but I am trying to solve it using BigQuery (SQL), any suggestion would be appreciated.

Thanks in advance.

Consider below approach

with recursive init as (
  select distinct least(t1.CustomerId, t2.CustomerId) id1, greatest(t1.CustomerId, t2.CustomerId) id2
  from your_table t1
  join your_table t2
  on t1.ProductId = t2.ProductId
  and t1.CustomerId != t2.CustomerId
), iterations as (
  select id1 networkId, id1, [id1] net from init where id1 not in (select distinct id2 from init) 
  union all 
  select networkId, id2, net || [id2]
  from iterations a
  join init b
  using(id1)
)
select row_number() over() networkId, array (
  select distinct id
  from t.net id
) CustomerId
from (
  select networkId, array_concat_agg(net) net
  from iterations
  group by networkId
) t

         

if applied to sample data in your question - output is

在此处输入图像描述

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM