简体   繁体   中英

SQL: Remove duplicates in self-join

I have the following table (called t1):

| id | Name    |
| 1  | Charlie |
| 2  | Bob     |
| 3  | Alice   |

I want to match the table with itself (self-join) but only choose a combination that has not already appeared. So far, I have the following:

select * from t1 a, t1 b
where a.id != b.id

which gives me this result:

| a.id | a.Name  | b.id | b.Name  |
| 2    | Bob     | 1    | Charlie | 
| 3    | Alice   | 1    | Charlie | 
| 1    | Charlie | 2    | Bob     | 
| 3    | Alice   | 2    | Bob     | 
| 1    | Charlie | 3    | Alice   | 
| 2    | Bob     | 3    | Alice   | 

I only want an id to appear once from table a, and once from table b. A desired outcome would be:

| a.id | a.Name  | b.id | b.Name  |
| 2    | Bob     | 1    | Charlie | 
| 3    | Alice   | 2    | Bob     | 
| 1    | Charlie | 3    | Alice   |

But I'm stumped as to how to guarantee this.

I am using SQL Server 2017.

Here's a fiddle with my test: DEMO

PS: I've checked this question, but the concept of the solution using a "less than" as a comparison operator isn't clear to me in my own example.

Edit: There are no rules as to which pair is chosen; the pairs could be (2,3), (3,1), (1,2) instead of the ones I presented above because the only rules I am interested in is having only once each id from table a and from table b , and a.id != b.id .

Edit 2: There is no logic to match them, please think about it as this possible premise: I am matchmaking Alice, Bob and Charlie as if they are having a Secret Gift Exchange. They could only offer a gift to one person, could only receive one gift, and could not offer a gift to themselves. (I think this allows scalability)

Here is one option which uses a ROW_NUMBER trick to stagger each name with a different name:

WITH cte AS (
    SELECT id, Name, ROW_NUMBER() OVER (ORDER BY id) rn
    FROM t1
)

SELECT
    t1.Name,
    t2.Name
FROM cte t1
INNER JOIN cte t2
    ON (t1.rn % (SELECT COUNT(*) FROM cte)) + 1 = t2.rn;

下面演示的屏幕截图

Demo

The logic is to just match row number 1 with 2, 2 with 3, and 3 with 1 (we use the modulus to wrap around at the edge case). This ensures that no name would ever appear more than once in a given column.

OP want to match assign each person a random partner, the solution is not completely random and only works if the IDs are continuous. However, it can be fixed by calling combining random/order_by/row_number

my lazy fix is:

select * from t1 a, t1 b
where a.id = b.id % ( select count(*) from t1 c) + 1

Use row_number() . Then do self join based on row number.

select a.id, a.name, b.id, b.name from
    (select row_number() over (order by id desc) rn, id, name from t1) a
join 
    (select row_number() over (order by id asc) rn, id, name from t1) b on a.rn= b.rn

Here is another way to do this.

This partitions the data based on which of the two ids are greater and create a concatenated string (larger_id,'|',smaller_id)

After that i am chosing just one value on the concatenated string by checking where rnk=1.

with data
as (
select a.id a_id,a.name as a_name,b.name as b_name,b.id b_id
       ,row_number() over(partition by case when a.id>b.id then concat(a.id,'|',b.id) 
                                            else concat(b.id,'|',a.id) end
                              order by b.id desc)
        as rnk
  from t1 a
  join t1 b
    on a.id != b.id
     )
select *
  from data
 where rnk=1

https://dbfiddle.uk/?rdbms=sqlserver_2017&fiddle=c3f82c8d21dc14899a263adacf1b31e6

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM