[英]Finding Matched Pairs using each record only once in SQL Server
I need to find matched pairs of records in SQL Server, but each record can only be included in 1 pair .我需要在 SQL Server 中找到匹配的记录对,但每条记录只能包含在1 pair 中。 Once a record has been matched with a pair, it should be removed from consideration for any future pairs.
一旦记录与一对匹配,就应该将其从任何未来对的考虑中删除。
I have tried solutions involving ROW_NUMBER()
and LEAD()
, but i just can't quite get there.我已经尝试过涉及
ROW_NUMBER()
和LEAD()
解决方案,但我只是无法到达那里。
This will be used to pair financial accounts with similar accounts for review, based on multiple customer attributes such as credit score, income, etc.这将用于根据信用评分、收入等多个客户属性,将金融账户与类似账户配对进行审查。
Statement:陈述:
declare @test table (ID numeric, Color varchar(20))
insert into @test values
(1,'Blue'),(2,'Red'),(3,'Blue'),(4,'Yellow'),(5,'Blue'),(6,'Red')
select*
from @test t1
join @test t2
on t1.Color = t2.Color
and t1.ID < t2.ID -----removes reverse-pairs and self-pairs
Current results:当前结果:
ID Color ID Color
--- ------- --- --------
1 Blue 3 Blue
1 Blue 5 Blue -----should not appear because 1 has already been paired
3 Blue 5 Blue -----should not appear because 3 and 5 have already been paired
2 Red 6 Red
Needed results:需要的结果:
ID Color ID Color
--- ------- --- --------
1 Blue 3 Blue
2 Red 6 Red
Editing with Max comments使用最大评论编辑
Here is one way to get this done..这是完成此操作的一种方法..
I first rank the records on the basis of color with the lowest id with rnk=1, next one with rnk=2.我首先根据 rnk=1 的最低 id 的颜色对记录进行排名,接下来是 rnk=2 的记录。
After that i join the tables together by pulling the rnk=1 records and joining then with rnk=2.之后,我通过拉出 rnk=1 记录然后与 rnk=2 连接来将表连接在一起。
declare @test table (ID numeric, Color varchar(20))
insert into @test values
(1,'Blue'),(2,'Red'),(3,'Blue'),(4,'Yellow'),(5,'Blue'),(6,'Red'),(7,'Blue')
;with data
as (select row_number() over(partition by color order by id asc) as rnk
,color
,id
from @test
)
select a.id,a.color,b.id,b.color
from data a
join data b
on a.Color=b.Color
and b.rnk=a.rnk+1
where a.rnk%2=1
i get the output as follows我得到如下输出
+----+-------+----+-------+
| id | color | id | color |
+----+-------+----+-------+
| 1 | Blue | 3 | Blue |
| 5 | Blue | 7 | Blue |
| 2 | Red | 6 | Red |
+----+-------+----+-------+
You could use row_number()
and conditional aggregation:您可以使用
row_number()
和条件聚合:
select
max(case when rn % 2 = 0 then id end) id1,
max(case when rn % 2 = 0 then color end) color1,
max(case when rn % 2 = 1 then id end) id2,
max(case when rn % 2 = 1 then color end) color2
from (
select
t.*,
row_number() over(partition by color order by id) - 1 rn
from @test t
) t
group by color, rn / 2
having count(*) = 2
The subquery ranks records having the same color
by increasing id
.子查询通过增加
id
具有相同color
的记录进行排名。 Then, the outer query groups pairwise, and filters on groups that do contain two records.然后,外部查询成对分组,并过滤包含两条记录的组。
Demo on DB Fiddle : DB Fiddle 上的演示:
id1 | color1 | id2 | color2 :-- | :----- | :-- | :----- 1 | Blue | 3 | Blue 2 | Red | 6 | Red
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.