[英]How to find matching pairs with transitivity of equality in MS SQL
如果我有一个包含这样的数据的表:
x1 y1 x2 y2
A001 1 B001 2
A001 1 B002 2
A002 2 A001 1
C001 2 B003 3
C002 1 B003 3
看起来像SQL查询(Microsoft SQL Server)的结果是这样的:
GroupId x y
1 A001 1
1 B001 2
1 B002 2
1 A002 2
2 C001 2
2 C002 1
2 B003 3
这是关于将相等对配对,例如:如果a == b和b == c,则a == c
如果不需要递归,则可以使用density_rank()函数来实现:
;with cte1(x, y, xp, yp) as (
select x1, y1, x2, y2 from Table1 union all
select x2, y2, x1, y1 from Table1
), cte2 as (
select
x, y, xp, yp,
case
when x < xp then x + cast(y as nvarchar(max))
else xp + cast(yp as nvarchar(max))
end as grp
from cte1
)
select distinct
x, y,
dense_rank() over (order by grp) as grp
from cte2
order by grp, x, y
参见sql小提琴与示例
希望能有所帮助
更新其实事实证明它更容易与表变量做到这一点:
declare @Table2 table (x varchar(4), y int, grp int)
declare @Table3 table (x varchar(4), y int, xp varchar(4), yp int)
declare @i int = 1
insert into @Table2
select x, y, row_number() over (order by x, y) as grp
from
(
select distinct x1, y1 from @Table1 union
select distinct x2, y2 from @Table1
) as a(x, y)
insert into @Table3
select x1, y1, x2, y2 from @Table1 union
select x2, y2, x1, y1 from @Table1
while @i > 0
begin
update T2 set
grp = T4.grp
from @Table2 as T2
inner join @Table3 as T3 on T3.x = T2.x and T3.y = T2.y
inner join @Table2 as T4 on T4.x = T3.xp and T4.y = T3.yp
where T4.grp < T2.grp
select @i = @@rowcount
end
select x, y, dense_rank() over (order by grp)
from @Table2
好吧,经过一番尝试,我发现了以下内容:
DECLARE @mod int; SET @mod=1;
DECLARE @newgrp int; SET @newgrp=1;
CREATE TABLE tbl([x1] varchar(4), [y1] int, [x2] varchar(4), [y2] int);
INSERT INTO tbl
([x1], [y1], [x2], [y2])
SELECT 'A001', 1, 'B001', 2 -- modified input to create chained equalities:
UNION ALL SELECT 'B001', 2, 'B002', 2 -- --> replaced A001 1 by B001 2
UNION ALL SELECT 'A002', 2, 'B002', 1
UNION ALL SELECT 'C001', 2, 'B003', 3
UNION ALL SELECT 'C002', 1, 'B003', 3;
SELECT x,y,0 grp INTO tmp FROM (
SELECT x1 x,y1 y FROM tbl union SELECT x2 x, y2 y FROM tbl ) t;
-- set first seed: grp=1 on first ID only ...
UPDATE TOP(1) tmp SET grp=1;
-- now iteratively populate the tmp table
WHILE @newgrp>0 -- for each group
BEGIN
WHILE @mod>0
BEGIN -- in case of chained equalities
UPDATE t2 SET grp=tmp.grp FROM tmp
INNER JOIN ( SELECT x1,x2 FROM tbl
UNION SELECT x2,x1 FROM tbl ) -- do group assignments in both directions!
tt ON tt.x1 = tmp.x AND tmp.grp>0
INNER JOIN tmp t2 ON t2.x = x2 AND t2.grp=0
SET @mod=@@ROWCOUNT;
END
-- OK, move on to the next group and then repeat the game ...
UPDATE TOP(1) tmp SET grp=(SELECT MAX(grp) FROM tmp)+1 WHERE grp=0
SELECT @newgrp=@@ROWCOUNT, @mod=1;
END
-- show the result
SELECT * FROM tmp
结果:
x y grp
---- --- ---
A001 1 1
A002 2 1
B001 2 1
B002 2 1
B003 3 2
C001 2 2
C002 1 2
建议的示例脚本假定y列与比较无关(示例数据的每个x值正好有一个y值)。 如果有必要,当然可以将y列包括在比较过程中。
编辑:
等等(现在比较中也包括y列): 并且...这是与之配套的SQLfiddle (我起初输入了太多分号-愚蠢的我)!
CREATE TABLE tbl([x1] varchar(4), [y1] int, [x2] varchar(4), [y2] int);
INSERT INTO tbl
(x1, y1, x2, y2)
SELECT 'A001', 1, 'B001', 2
UNION ALL SELECT 'A001', 1, 'B002', 2
UNION ALL SELECT 'A002', 2, 'A001', 1
UNION ALL SELECT 'D001', 3, 'B003', 3
UNION ALL SELECT 'D003', 1, 'D001', 3
UNION ALL SELECT 'D001', 1, 'A001', 1
UNION ALL SELECT 'C001', 2, 'B003', 3
UNION ALL SELECT 'C002', 1, 'B003', 3
-- start of processing ...
SELECT x,y,0 grp INTO tmp FROM (
SELECT x1 x,y1 y FROM tbl union SELECT x2 x, y2 y FROM tbl ) t;
DECLARE @mod int
SET @mod=1
DECLARE @newgrp int
SET @newgrp=1
UPDATE TOP(1) tmp SET grp=1 -- set first grp-label (seed)
-- now iteratively populate the tmp table
WHILE @newgrp>0 -- for each group
BEGIN
WHILE @mod>0 -- in case of chained equalities
BEGIN
UPDATE t2 SET grp=tmp.grp FROM tmp
INNER JOIN ( SELECT x1,y1,x2,y2 FROM tbl
UNION SELECT x2,y2,x1,y1 FROM tbl ) -- do group assignments in both directions!
tt ON tt.x1 = tmp.x AND tt.y1 = tmp.y AND tmp.grp>0
INNER JOIN tmp t2 ON t2.x = tt.x2 AND t2.y = tt.y2 AND t2.grp=0
SET @mod=@@ROWCOUNT
-- OK, move on to the next group and then repeat the game ...
END
UPDATE TOP(1) tmp SET grp=(SELECT MAX(grp) FROM tmp)+1 WHERE grp=0
SELECT @newgrp=@@ROWCOUNT, @mod=1
END
-- show the result
SELECT * FROM tmp
-- and drop tmp again
DROP TABLE tmp
我还添加了一些示例数据来显示链式相等性 ( 'D001' 3
= 'B003' 3
和'D003' 1
= 'D001' 3
)并具有带有不同y值的案例( 'D001',1
和'D001' 3
)。 while
循环使我有些头疼,因为起初我没有足够注意@@ ROWCOUNT的内容……现在应该可以再次使用它了!
链式等价案例(递归)是此查询中的主要问题。 如果不是这样,那么所有操作都可以在一个语句中完成,请参阅@Roman Pekar)
我的(扩展)示例的结果:
x y grp
------ --- ----
A001 1 1
A002 2 1
B001 2 1
B002 2 1
B003 3 2
C001 2 2
C002 1 2
D001 1 1
D001 3 2
D003 1 2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.