简体   繁体   中英

SQL to ensure unique node names in adjacency list

So I have an adjacency list that forms a hierarchy simulating a versioned file structure. The problem is that the incoming file names are not currently unique and they need to be. To make things slightly more interesting the files may have different versions which should keep the name of the first version (note the versions all have the same NodeID).

Adjacency List

ParentID NodeID VersionNum FileName
-1 1 1 FirstFolder
1 2 1 SecondFolder
1 3 1 ThirdFolder
1 4 1 FirstDocument
1 4 2 FirstDocument
1 5 1 FirstDocument
1 5 2 FirstDocument
2 6 1 FirstDocument
2 6 2 FirstDocument
2 7 1 SecondDocument
3 8 1 SecondDocument
3 9 1 ThirdDocument
3 9 2 ThirdDocument
3 10 1 ThirdDocument
3 11 1 ThirdDocument

Targeted Result

ParentID NodeID VersionNum FileName
-1 1 1 FirstFolder
1 2 1 SecondFolder
1 3 1 ThirdFolder
1 4 1 FirstDocument
1 4 2 FirstDocument
1 5 1 FirstDocument_1
1 5 2 FirstDocument_1
2 6 1 FirstDocument
2 6 2 FirstDocument
2 7 1 SecondDocument
3 8 1 SecondDocument
3 9 1 ThirdDocument
3 9 2 ThirdDocument
3 10 1 ThirdDocument_1
3 11 1 ThirdDocument_2

*I should also note that the folder names are already guaranteed to be unique (they already exist, it is the documents that are incoming) and they only have 1 version.


CREATE TABLE #tmp_tree
(
    ParentID INT,
    NodeID INT,
    VersionNum INT,
    FileName VARCHAR(50),
);

INSERT INTO  #tmp_tree (ParentID, NodeID, VersionNum, FileName)
VALUES (-1, 1, 1, 'FirstFolder' ),
 (1, 2, 1, 'SecondFolder' ),
 (1, 3, 1, 'ThirdFolder' ),
 (1, 4, 1, 'FirstDocument' ),
 (1, 4, 2, 'FirstDocument' ),
 (1, 5, 1, 'FirstDocument' ),
 (1, 5, 2, 'FirstDocument' ),
 (2, 6, 1, 'FirstDocument' ),
 (2, 6, 2, 'FirstDocument' ),
 (2, 7, 1, 'SecondDocument' ),
 (3, 8, 1, 'SecondDocument' ),
 (3, 9, 1, 'ThirdDocument' ),
 (3, 9, 2, 'ThirdDocument' ),
 (3, 10, 1, 'ThirdDocument' )
 (3, 11, 1, 'ThirdDocument' )

I really don't know how to approach this though resorting to a stored procedure. Adjacency list scream CTEs to me but that got me no where real fast. Group By loses the NodeID so while I can find the names of the documents that need to be renamed - I don't know how to use that to select the second occurrence of the name (ordered by NodeID).

-- I don't see how this helps... but this finds the names that need to change.
select ParentID, FileName,VersionNum, count(*) from #tmp_tree 
 GROUP BY ParentID, FileName, VersionNum
 HAVING VersionNum = 1 and count(*) > 1
 order by FileName

I know how to solve this procedural but not declaratively.

I don't know if this is closer or farther away from the solution:

 select f2.*, Row_Number() over (order by f2.FileName) from 
 (select top 10 f.*, count(FileName) over (PARTITION by ParentID, FileName) as n from (select * from #tmp_tree where versionNum = 1) as f
 order by f.ParentID, f.FileName) as f2
 Where n > 1

I would assume the last line (3, 11) in the targeted result is a mistake.

You can find the repeated names with a window function in a subquery and then join it during the update. In short, you can do:

update #tmp_tree
set #tmp_tree.filename = concat(#tmp_tree.filename, '_', x.rn)
from #tmp_tree
join (
  select *,
    row_number() over(partition by parentid, filename order by nodeid) as rn
  from #tmp_tree
  where versionnum = 1
) x on x.rn > 1 and x.nodeid = #tmp_tree.nodeid;

Result:

 ParentID  NodeID  VersionNum  FileName        
 --------- ------- ----------- --------------- 
 -1        1       1           FirstFolder     
 1         2       1           SecondFolder    
 1         3       1           ThirdFolder     
 1         4       1           FirstDocument   
 1         4       2           FirstDocument   
 1         5       1           FirstDocument_2 
 1         5       2           FirstDocument_2 
 2         6       1           FirstDocument   
 2         6       2           FirstDocument   
 2         7       1           SecondDocument  
 3         8       1           SecondDocument  
 3         9       1           ThirdDocument   
 3         9       2           ThirdDocument   
 3         10      1           ThirdDocument_2 

See running example at db<>fiddle .

You don't need to self-join the table , you can update the derived table directly, after calculating the row-number using DENSE_RANK

update x
set filename = concat(x.filename, '_', x.rn)
from (
  select *,
    dense_rank() over(partition by parentid, filename order by nodeid) as rn
  from #tmp_tree
) x
where x.rn > 1;

db<>fiddle

DENSE_RANK will return the same number for tied results according to the ordering clause.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM