SQL-每個級別都有記錄的遞歸樹層次結構

Question

嘗試使用SAS在SQL中創建經典的層次結構樹（據我所知，它不支持WITH RECURSIVE）。

這是現有表中的簡化數據結構：

|USER_ID|SUPERVISOR_ID|

因此，要構建層次結構，您只需將其遞歸地連接x次，以獲取所需的數據，其中SUPERVISOR_ID = USER_ID 。 在我的公司中，它是16個級別。

嘗試獲取每個用戶的分支終止時會出現此問題。 例如，假設級別1的用戶A在級別2下具有用戶B，C，D和E。因此，使用遞歸LEFT JOIN，您將獲得：

| -- Level 1 -- | -- Level 2 -- |
     User A          User B
     User A          User C
     User A          User D
     User A          User E

問題是，用戶A沒有自己的終止分支。 所需的最終結果是：

| -- Level 1 -- | -- Level 2 -- |
     User A           NULL         
     User A          User B
     User A          User C
     User A          User D
     User A          User E

我第一個臉紅的想法是，我可以通過在每個級別創建一個臨時表然后對結果執行全部UNION ALL來解決此問題，但是鑒於大小（16個級別），這似乎效率極低，並且希望我在這里丟失一些東西是一種更清潔的解決方案。

Answer 1

我不太確定我是否理解這個問題，但是如果您要生成每個主管下所有雇員的完整清單，那么這是一種實現方式，假設每個雇員都有唯一的ID，該ID可以顯示在用戶或主管列：

data employees;
input SUPERVISOR_ID USER_ID;
cards;
1 2
1 3
1 4
2 5
2 6
2 7
7 8
;
run;

proc sql;
  create view distinct_employees as 
  select distinct SUPERVISOR_ID as USER_ID from employees
  union
  select distinct USER_ID from employees;
quit;

data hierarchy;
  if 0 then set employees;
  set distinct_employees;
  if _n_ = 1 then do;
    declare hash h(dataset:'employees');
    rc = h.definekey('USER_ID');
    rc = h.definedata('SUPERVISOR_ID');
    rc = h.definedone();
  end;
  T_USER_ID = USER_ID;
  do while(h.find() = 0);
    USER_ID = T_USER_ID;
    output;
    USER_ID = SUPERVISOR_ID;
  end;
  drop rc T_USER_ID;
run;

proc sort data = hierarchy;
  by SUPERVISOR_ID USER_ID;
run;

Answer 2

考慮一些簡單的過程P，該過程根據一組（super_id，user_id）創建可能的路徑的矩形。

長度為N的路徑深度為N個級別，並鏈接（N-1）個關系。

每個級別的值是否與該級別不同？

沒有？ 與實際路徑相比，P將找到循環，交叉路徑和環繞路徑。 當實際路徑級別> 1的節點被“發現”為級別= 1節點時，即發生繞回。
是？ P將找到路徑，交叉路徑和環繞路徑。 其他數據限制或規則可以幫助消除

考慮4個簡單的路徑，它們的級別值不明確：

data path(keep=L1-L4) rels(keep=super_id user_id);
  array L(4);
  input L(*);
  output path;
  super_id = L(1);
  do i = 2 to dim(L);
    user_id = L(i);
    output rels;
    super_id = user_id;
  end;
datalines;
1 3 1 4
1 5 1 4
2 3 2 3
1 2 3 4
run;

只有12個關系數據。 這些對的生存路徑和存在的水平都不是未知的：

一個明確的兩階段查詢，用於在關系之間組裝4級路徑。 如果代碼有效，則可以對其進行抽象以進行宏編碼。

proc sql;

  * RELS cross RELS, extensive i/o;
  * get on the induction ladder;

  create table ITER_1 as
  select distinct
    S.super_id as L3 /* parent^2 */
  , S.user_id as L2 /* parent */ 
  , U.user_id as L1 /* leaf */
  from RELS U
  cross join RELS S 
  where S.user_id = U.super_id
  order by L3, L2, L1
  ;

  * ITER_1 cross RELS, little less extensive i/o;
  * if you see the inductive variation you can macroize it;

  create table ITER_2 as
  select distinct
    S.super_id as L4 /* parent^3 */
  , U.L3 /* parent^2 */
  , U.L2 /* parent */
  , U.L1 /* leaf */
  from ITER_1 U
  cross join RELS S
  where S.user_id = U.L3
  order by L4, L3, L2, L1
  ;
quit;

上述匯編器沒有線對標識知識，並且不能限制於離散線對的路徑。 因此會有循環，交叉和換行。

找到的路徑（一些解釋）

 1 : 1 2 3 1   path 4 L3 xover to path 1 L2
 2 : 1 2 3 2   path 4 L3 xover to path 3 L2
 3 : 1 2 3 4   actual
 4 : 1 3 1 2   path 1 L3 xover to path 4 L1
 5 : 1 3 1 3
 6 : 1 3 1 4   actual
 7 : 1 3 1 5
 8 : 1 3 2 3
 9 : 1 5 1 2
10 : 1 5 1 3
11 : 1 5 1 4   actual
12 : 1 5 1 5
13 : 2 3 1 2
14 : 2 3 1 3
15 : 2 3 1 4
16 : 2 3 1 5
17 : 2 3 2 3   actual is actually a cycler too
18 : 3 1 2 3
19 : 3 1 3 1
20 : 3 1 3 2
21 : 3 1 3 4
22 : 3 1 5 1
23 : 3 2 3 1
24 : 3 2 3 2
25 : 3 2 3 4
26 : 5 1 2 3
27 : 5 1 3 1
28 : 5 1 3 2
29 : 5 1 3 4
30 : 5 1 5 1   path 2 L3 cycled to path 2 L1

如果在任何其他級別都沒有找到每個關系級別的id，則隱式消除循環。 由於沒有路徑標識信息，因此無法消除交叉現象。 環繞式同樣。

更復雜的SQL可以確保找到的“路徑”中的每個關系僅出現一次，並且路徑的內容不同。 根據實際數據，您可能仍有大量錯誤路徑。

高度規則的代碼適合於宏化，但是實際的SQL運行時高度依賴於實際數據和RELs數據集索引。

proc sql;

create table ITER_1 as
select 
  L3 /* parent^2 */
, L2 /* parent */ 
, L1 /* leaf */
, R1
, R2
from 
(
  select distinct
    S.super_id as L3 /* parent^2 */
  , S.user_id as L2 /* parent */ 
  , U.user_id as L1 /* leaf */
  , U.row_id as R1
  , S.row_id as R2
  , monotonic() as seq
  from RELS U
  cross join RELS S 
  where S.user_id = U.super_id
    and S.row_id < U.row_id  /* triangular constraint allowed due to symmetry */
)
group by L3, L2, L1
having seq = min(seq)
order by L3, L2, L1
;

create table ITER_2 as
select
  L4 /* parent^3 */ format=6.
, L3 /* parent^2 */ format=6.
, L2 /* parent */ format=6.
, L1 /* leaf */ format=6.
, R1 format=6.
, R2 format=6.
, R3 format=6.
from
(
  select distinct
    S.super_id as L4 /* parent^3 */ format=6.
  , U.L3 /* parent^2 */ format=6.
  , U.L2 /* parent */ format=6.
  , U.L1 /* leaf */ format=6.
  , U.R1 format=6.
  , U.R2 format=6.
  , S.row_id as R3 format=6.
  , monotonic() as seq
  from ITER_1 U
  cross join RELS S
  where S.user_id = U.L3
    and S.row_id ne R1
    and S.row_id ne R2
)
group by L4, L3, L2, L1
having seq = min(seq)
order by L4, L3, L2, L1
;

放棄;

NULL項目的最后調整將需要更多的SQL。

是否可以在不需要NULL的情況下處理發現的層次結構？ 帶有BY處理的DATA Step SET可以使用LAST檢測電平的結束。

SQL-每個級別都有記錄的遞歸樹層次結構

問題描述

2 個解決方案

解決方案1
2 2017-12-14 15:04:07

解決方案2
1 2017-12-14 08:44:18

SQL-每個級別都有記錄的遞歸樹層次結構

問題描述

2 個解決方案

解決方案1 2 2017-12-14 15:04:07

解決方案2 1 2017-12-14 08:44:18

解決方案1
2 2017-12-14 15:04:07

解決方案2
1 2017-12-14 08:44:18