[英]How to find all connected subgraphs of an undirected graph
I need some help for a problem that i am struggling to solve.我需要一些帮助来解决我正在努力解决的问题。
Example table:示例表:
ID |Identifier1 | Identifier2
---------------------------------
1 | a | c
2 | b | f
3 | a | g
4 | c | h
5 | b | j
6 | d | f
7 | e | k
8 | i |
9 | l | h
I want to group identifiers that are related with each other between two columns and assign a unique group id.我想对两列之间相互关联的标识符进行分组,并分配一个唯一的组 ID。
Desired Output:所需 Output:
Identifier | Gr_ID | Gr.Members
---------------------------------------------------
a | 1 | (a,c,g,h,l)
b | 2 | (b,d,f,j)
c | 1 | (a,c,g,h,l)
d | 2 | (b,d,f,j)
e | 3 | (e,k)
f | 2 | (b,d,f,j)
g | 1 | (a,c,g,h,l)
h | 1 | (a,c,g,h,l)
j | 2 | (b,d,f,j)
k | 3 | (e,k)
l | 1 | (a,c,g,h,l)
i | 4 | (i)
Note:the column Gr.Members is not necessary, mostly is used for a clearer view.注意:Gr.Members 列不是必需的,主要用于更清晰的视图。
So the definition for a group is: A row belongs to a group if it shares at least one identifier with at least one row of this group所以组的定义是:如果一行与该组中的至少一行共享至少一个标识符,则该行属于一个组
But the group id has to be assigned to each identifier(selected by the union of the two columns) not to the row.但是必须将组 ID 分配给每个标识符(由两列的并集选择)而不是行。
Any help on how to build a query to give the desired output?关于如何构建查询以提供所需的 output 的任何帮助?
Thank you.谢谢。
Update: Below are some extra sample sets with their expected output.更新:下面是一些额外的样本集,它们的预期值为 output。
Given table:给定表:
Identifier1 | Identifier2
----------------------------
a | f
a | g
a | NULL
b | c
b | a
b | h
b | j
b | NULL
b | NULL
b | g
c | k
c | b
d | l
d | f
d | g
d | m
d | a
d | NULL
d | a
e | c
e | b
e | NULL
Expected output: all the records should belong to the same group with group ID = 1.预期 output:所有记录应属于同一组,组 ID = 1。
Given Table:给定表:
Identifier1 | Identifier2
--------------------------
a | a
b | b
c | a
c | b
c | c
Expected output: The records should be in the same group with group ID = 1.预期 output:记录应在同一组中,组 ID = 1。
Here is a variant that doesn't use cursor, but uses a single recursive query.这是一个不使用游标但使用单个递归查询的变体。
Essentially, it treats the data as edges in a graph and traverses recursively all edges of the graph, stopping when the loop is detected.本质上,它将数据视为图中的边并递归遍历图的所有边,在检测到循环时停止。 Then it puts all found loops in groups and gives each group a number.然后它将所有找到的循环放在组中并给每个组一个编号。
See the detailed explanations of how it works below.请参阅下面有关其工作原理的详细说明。 I recommend you to run the query CTE-by-CTE and examine each intermediate result to understand what it does.我建议您运行查询 CTE-by-CTE 并检查每个中间结果以了解它的作用。
Sample 1示例 1
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'a'),
(2, 'b', 'b'),
(3, 'c', 'a'),
(4, 'c', 'b'),
(5, 'c', 'c');
Sample 2样本 2
I added one more row with z
value to have multiple rows with unpaired values.我添加了一个带有z
值的行,以便有多个带有不成对值的行。
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'a'),
(1, 'a', 'c'),
(2, 'b', 'f'),
(3, 'a', 'g'),
(4, 'c', 'h'),
(5, 'b', 'j'),
(6, 'd', 'f'),
(7, 'e', 'k'),
(8, 'i', NULL),
(88, 'z', 'z'),
(9, 'l', 'h');
Sample 3示例 3
DECLARE @T TABLE (ID int, Ident1 char(1), Ident2 char(1));
INSERT INTO @T (ID, Ident1, Ident2) VALUES
(1, 'a', 'f'),
(2, 'a', 'g'),
(3, 'a', NULL),
(4, 'b', 'c'),
(5, 'b', 'a'),
(6, 'b', 'h'),
(7, 'b', 'j'),
(8, 'b', NULL),
(9, 'b', NULL),
(10, 'b', 'g'),
(11, 'c', 'k'),
(12, 'c', 'b'),
(13, 'd', 'l'),
(14, 'd', 'f'),
(15, 'd', 'g'),
(16, 'd', 'm'),
(17, 'd', 'a'),
(18, 'd', NULL),
(19, 'd', 'a'),
(20, 'e', 'c'),
(21, 'e', 'b'),
(22, 'e', NULL);
Query询问
WITH
CTE_Idents
AS
(
SELECT Ident1 AS Ident
FROM @T
UNION
SELECT Ident2 AS Ident
FROM @T
)
,CTE_Pairs
AS
(
SELECT Ident1, Ident2
FROM @T
WHERE Ident1 <> Ident2
UNION
SELECT Ident2 AS Ident1, Ident1 AS Ident2
FROM @T
WHERE Ident1 <> Ident2
)
,CTE_Recursive
AS
(
SELECT
CAST(CTE_Idents.Ident AS varchar(8000)) AS AnchorIdent
, Ident1
, Ident2
, CAST(',' + Ident1 + ',' + Ident2 + ',' AS varchar(8000)) AS IdentPath
, 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Idents ON CTE_Idents.Ident = CTE_Pairs.Ident1
UNION ALL
SELECT
CTE_Recursive.AnchorIdent
, CTE_Pairs.Ident1
, CTE_Pairs.Ident2
, CAST(CTE_Recursive.IdentPath + CTE_Pairs.Ident2 + ',' AS varchar(8000)) AS IdentPath
, CTE_Recursive.Lvl + 1 AS Lvl
FROM
CTE_Pairs
INNER JOIN CTE_Recursive ON CTE_Recursive.Ident2 = CTE_Pairs.Ident1
WHERE
CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
)
,CTE_RecursionResult
AS
(
SELECT AnchorIdent, Ident1, Ident2
FROM CTE_Recursive
)
,CTE_CleanResult
AS
(
SELECT AnchorIdent, Ident1 AS Ident
FROM CTE_RecursionResult
UNION
SELECT AnchorIdent, Ident2 AS Ident
FROM CTE_RecursionResult
)
SELECT
CTE_Idents.Ident
,CASE WHEN CA_Data.XML_Value IS NULL
THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END AS GroupMembers
,DENSE_RANK() OVER(ORDER BY
CASE WHEN CA_Data.XML_Value IS NULL
THEN CTE_Idents.Ident ELSE CA_Data.XML_Value END
) AS GroupID
FROM
CTE_Idents
CROSS APPLY
(
SELECT CTE_CleanResult.Ident+','
FROM CTE_CleanResult
WHERE CTE_CleanResult.AnchorIdent = CTE_Idents.Ident
ORDER BY CTE_CleanResult.Ident FOR XML PATH(''), TYPE
) AS CA_XML(XML_Value)
CROSS APPLY
(
SELECT CA_XML.XML_Value.value('.', 'NVARCHAR(MAX)')
) AS CA_Data(XML_Value)
WHERE
CTE_Idents.Ident IS NOT NULL
ORDER BY Ident;
Result 1结果 1
+-------+--------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------+---------+
| a | a,b,c, | 1 |
| b | a,b,c, | 1 |
| c | a,b,c, | 1 |
+-------+--------------+---------+
Result 2结果 2
+-------+--------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------+---------+
| a | a,c,g,h,l, | 1 |
| b | b,d,f,j, | 2 |
| c | a,c,g,h,l, | 1 |
| d | b,d,f,j, | 2 |
| e | e,k, | 3 |
| f | b,d,f,j, | 2 |
| g | a,c,g,h,l, | 1 |
| h | a,c,g,h,l, | 1 |
| i | i | 4 |
| j | b,d,f,j, | 2 |
| k | e,k, | 3 |
| l | a,c,g,h,l, | 1 |
| z | z | 5 |
+-------+--------------+---------+
Result 3结果 3
+-------+--------------------------+---------+
| Ident | GroupMembers | GroupID |
+-------+--------------------------+---------+
| a | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| b | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| c | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| d | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| e | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| f | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| g | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| h | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| j | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| k | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| l | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
| m | a,b,c,d,e,f,g,h,j,k,l,m, | 1 |
+-------+--------------------------+---------+
I'll use the second set of sample data for this explanation.我将使用第二组示例数据进行说明。
CTE_Idents
CTE_Idents
gives the list of all Identifiers that appear in both Ident1
and Ident2
columns. CTE_Idents
给出出现在Ident1
和Ident2
列中的所有标识符的列表。 Since they can appear in any order we UNION
both columns together.因为它们可以以任意顺序出现,我们UNION
两列在一起。 UNION
also removes any duplicates. UNION
还会删除任何重复项。
+-------+
| Ident |
+-------+
| NULL |
| a |
| b |
| c |
| d |
| e |
| f |
| g |
| h |
| i |
| j |
| k |
| l |
| z |
+-------+
CTE_Pairs
CTE_Pairs
gives the list of all edges of the graph in both directions. CTE_Pairs
给出了图在两个方向上的所有边的列表。 Again, UNION
is used to remove any duplicates.同样, UNION
用于删除任何重复项。
+--------+--------+
| Ident1 | Ident2 |
+--------+--------+
| a | c |
| a | g |
| b | f |
| b | j |
| c | a |
| c | h |
| d | f |
| e | k |
| f | b |
| f | d |
| g | a |
| h | c |
| h | l |
| j | b |
| k | e |
| l | h |
+--------+--------+
CTE_Recursive
CTE_Recursive
is the main part of the query that recursively traverses the graph starting from each unique Identifier. CTE_Recursive
是查询的主要部分,它从每个唯一标识符开始递归遍历图。 These starting rows are produced by the first part of UNION ALL
.这些起始行由UNION ALL
的第一部分生成。 The second part of UNION ALL
recursively joins to itself linking Ident2
to Ident1
. UNION ALL
的第二部分递归地连接到自身,将Ident2
链接到Ident1
。 Since we pre-made CTE_Pairs
with all edges written in both directions, we can always link only Ident2
to Ident1
and we'll get all paths in the graph.由于我们预先制作的CTE_Pairs
写在两个方向的所有边缘,我们能够始终只能链接Ident2
到Ident1
,我们将图中的所有路径。 At the same time the query builds IdentPath
- a string of comma-delimited Identifiers that have been traversed so far.同时,查询构建IdentPath
- 到目前为止已遍历的以逗号分隔的标识符字符串。 It is used in the WHERE
filter:它用于WHERE
过滤器:
CTE_Recursive.IdentPath NOT LIKE CAST('%,' + CTE_Pairs.Ident2 + ',%' AS varchar(8000))
As soon as we come across the Identifier that had been included in the Path before, the recursion stops as the list of connected nodes is exhausted.一旦我们遇到之前包含在路径中的标识符,递归就会停止,因为连接的节点列表已用完。 AnchorIdent
is the starting Identifier for the recursion, it will be used later to group results. AnchorIdent
是递归的起始标识符,稍后将用于对结果进行分组。 Lvl
is not really used, I included it for better understanding of what is going on. Lvl
并未真正使用,我将其包含在内是为了更好地了解正在发生的事情。
+-------------+--------+--------+-------------+-----+
| AnchorIdent | Ident1 | Ident2 | IdentPath | Lvl |
+-------------+--------+--------+-------------+-----+
| a | a | c | ,a,c, | 1 |
| a | a | g | ,a,g, | 1 |
| b | b | f | ,b,f, | 1 |
| b | b | j | ,b,j, | 1 |
| c | c | a | ,c,a, | 1 |
| c | c | h | ,c,h, | 1 |
| d | d | f | ,d,f, | 1 |
| e | e | k | ,e,k, | 1 |
| f | f | b | ,f,b, | 1 |
| f | f | d | ,f,d, | 1 |
| g | g | a | ,g,a, | 1 |
| h | h | c | ,h,c, | 1 |
| h | h | l | ,h,l, | 1 |
| j | j | b | ,j,b, | 1 |
| k | k | e | ,k,e, | 1 |
| l | l | h | ,l,h, | 1 |
| l | h | c | ,l,h,c, | 2 |
| l | c | a | ,l,h,c,a, | 3 |
| l | a | g | ,l,h,c,a,g, | 4 |
| j | b | f | ,j,b,f, | 2 |
| j | f | d | ,j,b,f,d, | 3 |
| h | c | a | ,h,c,a, | 2 |
| h | a | g | ,h,c,a,g, | 3 |
| g | a | c | ,g,a,c, | 2 |
| g | c | h | ,g,a,c,h, | 3 |
| g | h | l | ,g,a,c,h,l, | 4 |
| f | b | j | ,f,b,j, | 2 |
| d | f | b | ,d,f,b, | 2 |
| d | b | j | ,d,f,b,j, | 3 |
| c | h | l | ,c,h,l, | 2 |
| c | a | g | ,c,a,g, | 2 |
| b | f | d | ,b,f,d, | 2 |
| a | c | h | ,a,c,h, | 2 |
| a | h | l | ,a,c,h,l, | 3 |
+-------------+--------+--------+-------------+-----+
CTE_CleanResult
CTE_CleanResult
leaves only relevant parts from CTE_Recursive
and again merges both Ident1
and Ident2
using UNION
. CTE_CleanResult
叶仅从相关部分CTE_Recursive
并再次合并两个Ident1
和Ident2
使用UNION
。
+-------------+-------+
| AnchorIdent | Ident |
+-------------+-------+
| a | a |
| a | c |
| a | g |
| a | h |
| a | l |
| b | b |
| b | d |
| b | f |
| b | j |
| c | a |
| c | c |
| c | g |
| c | h |
| c | l |
| d | b |
| d | d |
| d | f |
| d | j |
| e | e |
| e | k |
| f | b |
| f | d |
| f | f |
| f | j |
| g | a |
| g | c |
| g | g |
| g | h |
| g | l |
| h | a |
| h | c |
| h | g |
| h | h |
| h | l |
| j | b |
| j | d |
| j | f |
| j | j |
| k | e |
| k | k |
| l | a |
| l | c |
| l | g |
| l | h |
| l | l |
+-------------+-------+
Final SELECT最终选择
Now we need to build a string of comma-separated Ident
values for each AnchorIdent
.现在我们需要为每个AnchorIdent
构建一串以逗号分隔的Ident
值。 CROSS APPLY
with FOR XML
does it. CROSS APPLY
with FOR XML
可以做到。 DENSE_RANK()
calculates the GroupID
numbers for each AnchorIdent
. DENSE_RANK()
计算每个AnchorIdent
的GroupID
编号。
This script produces the outputs for test sets 1, 2 and 3 as required.此脚本根据需要生成测试集 1、2 和 3 的输出。 Notes on the algorithm as comments in the script.算法注释作为脚本中的注释。
Be aware:意识到:
#tree
.在脚本中,输入集是#tree
。 So using this script requires inserting the source data into #tree
所以使用这个脚本需要将源数据插入#tree
NULL
values for nodes.此算法不适用于节点的NULL
值。 Replace NULL
values with CHAR(0)
when inserting into #tree
using ISNULL(source_col,CHAR(0))
to circumvent this shortcoming.替换NULL
与值CHAR(0)
插入时#tree
使用ISNULL(source_col,CHAR(0))
来规避这个缺点。 When selecting from the final result, replace CHAR(0)
with NULL
using NULLIF(node,CHAR(0))
.从最终结果中进行选择时,使用NULLIF(node,CHAR(0))
将CHAR(0)
替换为NULL
。Note that the answer using recursive CTEs is more elegant in that it is a single SQL statement, but for large input sets using recursive CTEs may give abysmal execution time (see this comment on that answer).请注意,使用递归 CTE的答案更优雅,因为它是单个 SQL 语句,但对于使用递归 CTE 的大型输入集,可能会产生极短的执行时间(请参阅有关该答案的评论)。 The solution as described below, while more convoluted, should run much faster for large input sets.下面描述的解决方案虽然更复杂,但对于大型输入集应该运行得更快。
SET NOCOUNT ON;
CREATE TABLE #tree(node_l CHAR(1),node_r CHAR(1));
CREATE NONCLUSTERED INDEX NIX_tree_node_l ON #tree(node_l)INCLUDE(node_r); -- covering indices to speed up lookup
CREATE NONCLUSTERED INDEX NIX_tree_node_r ON #tree(node_r)INCLUDE(node_l);
INSERT INTO #tree(node_l,node_r) VALUES
('a','c'),('b','f'),('a','g'),('c','h'),('b','j'),('d','f'),('e','k'),('i','i'),('l','h'); -- test set 1
--('a','f'),('a','g'),(CHAR(0),'a'),('b','c'),('b','a'),('b','h'),('b','j'),('b',CHAR(0)),('b',CHAR(0)),('b','g'),('c','k'),('c','b'),('d','l'),('d','f'),('d','g'),('d','m'),('d','a'),('d',CHAR(0)),('d','a'),('e','c'),('e','b'),('e',CHAR(0)); -- test set 2
--('a','a'),('b','b'),('c','a'),('c','b'),('c','c'); -- test set 3
CREATE TABLE #sets(node CHAR(1) PRIMARY KEY,group_id INT); -- nodes with group id assigned
CREATE TABLE #visitor_queue(node CHAR(1)); -- contains nodes to visit
CREATE TABLE #visited_nodes(node CHAR(1) PRIMARY KEY CLUSTERED WITH(IGNORE_DUP_KEY=ON)); -- nodes visited for nodes on the queue; ignore duplicate nodes when inserted
CREATE TABLE #visitor_ctx(node_l CHAR(1),node_r CHAR(1)); -- context table, contains deleted nodes as they are visited from #tree
DECLARE @last_created_group_id INT=0;
-- Notes:
-- 1. This algorithm is destructive in its input set, ie #tree will be empty at the end of this procedure
-- 2. This algorithm does not accept NULL values. Populate #tree with CHAR(0) for NULL values (using ISNULL(source_col,CHAR(0)), or COALESCE(source_col,CHAR(0)))
-- 3. When selecting from #sets, to regain the original NULL values use NULLIF(node,CHAR(0))
WHILE EXISTS(SELECT*FROM #tree)
BEGIN
TRUNCATE TABLE #visited_nodes;
TRUNCATE TABLE #visitor_ctx;
-- push first nodes onto the queue (via #visitor_ctx -> #visitor_queue)
DELETE TOP (1) t
OUTPUT deleted.node_l,deleted.node_r INTO #visitor_ctx(node_l,node_r)
FROM #tree AS t;
INSERT INTO #visitor_queue(node) SELECT node_l FROM #visitor_ctx UNION SELECT node_r FROM #visitor_ctx; -- UNION to filter when node_l equals node_r
INSERT INTO #visited_nodes(node) SELECT node FROM #visitor_queue; -- keep track of nodes visited
-- work down the queue by visiting linked nodes in #tree; nodes are deleted as they are visited
WHILE EXISTS(SELECT*FROM #visitor_queue)
BEGIN
TRUNCATE TABLE #visitor_ctx;
-- pop_front for node on the stack (via #visitor_ctx -> @node)
DELETE TOP (1) s
OUTPUT deleted.node INTO #visitor_ctx(node_l)
FROM #visitor_queue AS s;
DECLARE @node CHAR(1)=(SELECT node_l FROM #visitor_ctx);
TRUNCATE TABLE #visitor_ctx;
-- visit nodes in #tree where node_l or node_r equal target @node;
-- delete visited nodes from #tree, output to #visitor_ctx
DELETE t
OUTPUT deleted.node_l,deleted.node_r INTO #visitor_ctx(node_l,node_r)
FROM #tree AS t
WHERE t.node_l=@node OR t.node_r=@node;
-- insert visited nodes in the queue that haven't been visited before
INSERT INTO #visitor_queue(node)
(SELECT node_l FROM #visitor_ctx UNION SELECT node_r FROM #visitor_ctx) EXCEPT (SELECT node FROM #visited_nodes);
-- keep track of visited nodes (duplicates are ignored by the IGNORE_DUP_KEY option for the PK)
INSERT INTO #visited_nodes(node)
SELECT node_l FROM #visitor_ctx UNION SELECT node_r FROM #visitor_ctx;
END
SET @last_created_group_id+=1; -- create new group id
-- insert group into #sets
INSERT INTO #sets(group_id,node)
SELECT group_id=@last_created_group_id,node
FROM #visited_nodes;
END
SELECT node=NULLIF(node,CHAR(0)),group_id FROM #sets ORDER BY node; -- nodes with their assigned group id
SELECT g.group_id,m.members -- groups with their members
FROM
(SELECT DISTINCT group_id FROM #sets) AS g
CROSS APPLY (
SELECT members=STUFF((
SELECT ','+ISNULL(CAST(NULLIF(si.node,CHAR(0)) AS VARCHAR(4)),'NULL')
FROM #sets AS si
WHERE si.group_id=g.group_id
FOR XML PATH('')
),1,1,'')
) AS m
ORDER BY g.group_id;
DROP TABLE #visitor_queue;
DROP TABLE #visited_nodes;
DROP TABLE #visitor_ctx;
DROP TABLE #sets;
DROP TABLE #tree;
Output for set 1:第 1 组的输出:
+------+----------+
| node | group_id |
+------+----------+
| a | 1 |
| b | 2 |
| c | 1 |
| d | 2 |
| e | 4 |
| f | 2 |
| g | 1 |
| h | 1 |
| i | 3 |
| j | 2 |
| k | 4 |
| l | 1 |
+------+----------+
Output for set 2:第 2 组的输出:
+------+----------+
| node | group_id |
+------+----------+
| NULL | 1 |
| a | 1 |
| b | 1 |
| c | 1 |
| d | 1 |
| e | 1 |
| f | 1 |
| g | 1 |
| h | 1 |
| j | 1 |
| k | 1 |
| l | 1 |
| m | 1 |
+------+----------+
Output for set 3:第 3 组的输出:
+------+----------+
| node | group_id |
+------+----------+
| a | 1 |
| b | 1 |
| c | 1 |
+------+----------+
My suggestion is to use stored procedure with cursor.我的建议是使用带有游标的存储过程。 It is easy to implement and relatively fast.它易于实施且相对较快。 Only two steps:只需两步:
Query:询问:
CREATE TABLE #PairIds
(
Ident1 VARCHAR(10),
Ident2 VARCHAR(10)
)
INSERT INTO #PairIds
VALUES ('a', 'c'),
('b', 'f'),
('a', 'g'),
('c', 'h'),
('b', 'j'),
('d', 'f'),
('e', 'k'),
('l', 'h')
exec [dbo].[sp_GetIdentByGroup]
Result:结果:
Ident | GroupID --------------------------------------------------- a | 1 | b | 2 | c | 1 | d | 2 | e | 3 | f | 2 | g | 1 | h | 1 | j | 2 | k | 3 | l | 1 |
Code for creating stored procedure:创建存储过程的代码:
CREATE PROCEDURE [dbo].[sp_GetIdentByGroup]
AS
BEGIN
DECLARE @message VARCHAR(70);
DECLARE @IdentInput1 varchar(20)
DECLARE @IdentInput2 varchar(20)
DECLARE @Counter INT
DECLARE @Group1 INT
DECLARE @Group2 INT
DECLARE @Ident varchar(20)
DECLARE @IdentCheck1 varchar(20)
DECLARE @IdentCheck2 varchar(20)
SET @Counter = 1
DECLARE @IdentByGroupCursor TABLE (
Ident varchar(20) UNIQUE CLUSTERED,
GroupID INT
);
-- Use a cursor to select your data, which enables SQL Server to extract
-- the data from your local table to the variables.
declare ins_cursor cursor for
select Ident1, Ident2 from #PairIds
open ins_cursor
fetch next from ins_cursor into @IdentInput1, @IdentInput2 -- At this point, the data from the first row
-- is in your local variables.
-- Move through the table with the @@FETCH_STATUS=0
WHILE @@FETCH_STATUS=0
BEGIN
SET @Group1 = null
SET @Group2 = null
SELECT TOP 1 @Group1 = GroupID, @IdentCheck1 = Ident
FROM @IdentByGroupCursor
WHERE Ident in (@IdentInput1)
SELECT TOP 1 @Group2 = GroupID, @IdentCheck2 = Ident
FROM @IdentByGroupCursor
WHERE Ident in (@IdentInput2)
IF (@Group1 IS NOT NULL AND @Group2 IS NOT NULL)
BEGIN
IF @Group1 > @Group2
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group2
WHERE
GroupID = @Group1
END
IF @Group2 > @Group1
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group1
WHERE
GroupID = @Group2
END
END
ELSE IF @Group1 IS NOT NULL
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group1
WHERE
Ident IN (@IdentInput1)
END
ELSE IF @Group2 IS NOT NULL
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group2
WHERE
Ident IN (@IdentInput2)
END
IF (@Group1 IS NOT NULL AND @Group2 IS NOT NULL)
BEGIN
IF @Group1 > @Group2
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group2
WHERE
GroupID = @Group1
END
IF @Group2 > @Group1
BEGIN
UPDATE @IdentByGroupCursor
SET GroupID = @Group1
WHERE
GroupID = @Group2
END
END
IF @Group1 IS NULL
BEGIN
INSERT INTO @IdentByGroupCursor (Ident, GroupID)
VALUES (@IdentInput1, ISNULL(@Group2, @Counter))
END
IF @Group2 IS NULL
BEGIN
INSERT INTO @IdentByGroupCursor (Ident, GroupID)
VALUES (@IdentInput2, ISNULL(@Group1, @COunter))
END
IF (@Group1 IS NULL OR @Group2 IS NULL)
BEGIN
SET @COunter = @COunter +1
END
-- Once the execution has taken place, you fetch the next row of data from your local table.
fetch next from ins_cursor into @IdentInput1, @IdentInput2
End
-- When all the rows have inserted you must close and deallocate the cursor.
-- Failure to do this will not let you re-use the cursor.
close ins_cursor
deallocate ins_cursor
SELECT Ident ,DENSE_RANK() OVER( ORDER BY GroupID ASC) AS GroupID
FROM @IdentByGroupCursor
ORDER BY Ident
END
GO
Sp_GetIdentByGroup has an index for speed and with the use of a cursor, it prepares desired result set. Sp_GetIdentByGroup 有一个速度索引,并使用游标准备所需的结果集。 The stored procedure expects #PairIds table to exist.存储过程需要#PairIds 表存在。
More information on SQL How to group identifiers that are related with each other in specific groups .有关SQL 如何在特定组中对彼此相关的标识符进行分组的更多信息。
sp_GetIdentByGroup is fantastic approch I was searching for few days similar solution using CTE functions but they were too slow.Like for example 20 record we done in 5 sec, 30 records in 1:40 min and 35 was running for ages. sp_GetIdentByGroup 是很棒的方法,我正在使用 CTE 函数搜索类似的解决方案几天,但它们太慢了。例如,我们在 5 秒内完成了 20 条记录,在 1:40 分钟内完成了 30 条记录,而 35 条记录运行了很长时间。
Your procedure managed to rank 150 item is group with split of the seconds.你的程序设法排名 150 项目是一组与秒的分裂。 Thanks!谢谢!
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.