簡體   English   中英

如何通過 BigQuery (SQL) 查找社交網絡?

[英]How to find a social network by BigQuery (SQL)?

有一個關於購物的事實表,具有以下屬性。

客戶ID 產品編號
C1 P1
C1 P3
C2 P2
C2 P3
C3 P4
C4 P2
C5 P4
C5 P6
C6 P6

它描述了哪個客戶購買了什么產品。

現在,我想建立一個具有相同興趣的集群。

For e.g.


C1 bought P1 & P3.
P3 bought by C2 as well, so C1 & C2 have common interest because both bought P3. 

Now C2 also bought P2 and P2 bought by C4 as well. 
So C2 and C4 also have common interest because both bought P2. 

Thus, C1 is connected to C2 and C2 connect to C4. 
hence C1, C2 & C4 all together forms a network.

我想要一個像這樣的 output ,其中 NetworkId 應該是每個網絡的唯一 ID。

網絡 ID 客戶ID
N1 C1
N1 C2
N1 C4
N2 C3
N2 C5
N2 C6

這似乎是一個圖形問題,但我正在嘗試使用 BigQuery (SQL) 解決它,任何建議都將不勝感激。

提前致謝。

考慮以下方法

with recursive init as (
  select distinct least(t1.CustomerId, t2.CustomerId) id1, greatest(t1.CustomerId, t2.CustomerId) id2
  from your_table t1
  join your_table t2
  on t1.ProductId = t2.ProductId
  and t1.CustomerId != t2.CustomerId
), iterations as (
  select id1 networkId, id1, [id1] net from init where id1 not in (select distinct id2 from init) 
  union all 
  select networkId, id2, net || [id2]
  from iterations a
  join init b
  using(id1)
)
select row_number() over() networkId, array (
  select distinct id
  from t.net id
) CustomerId
from (
  select networkId, array_concat_agg(net) net
  from iterations
  group by networkId
) t

         

如果應用於您問題中的示例數據 - output 是

在此處輸入圖像描述

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM