简体   繁体   中英

How can I efficiently extract a sub-table which only contains rows that have duplicated elements in SQL?

The main task is obtaining a sub-table (apologies if this is not quite the correct term) from an existing table, where only a few rows of interested are kept. Essentially, the rows of interest are any such row that has an element which has an identical value in any other element in any other row.

Any explanation or help for which is the best way to go around this would be very helpful.

I have considered performing queries to check for each element in each row, and then simply making a union out of all the query results.

This is the basic of what I tried, although it is probably inefficient. Note that there are 3 columns and I am actually only checking for duplicated values within 2 columns ( PARTICIPANT_1 , PARTICIPANT_2 ).

SELECT * FROM 
(
    team_table
    )
WHERE PARTICIPANT_2 in (SELECT PARTICIPANT_2
                FROM
                (
                    select startdate, PARTICIPANT_1, PARTICIPANT_2 
                    from team_table              
                )
                GROUP BY PARTICIPANT_2 
                HAVING COUNT(distinct PARTICIPANT_1) > 1
               )

UNION
SELECT * FROM 
(
    team_table
    )
WHERE PARTICIPANT_1 in (SELECT PARTICIPANT_1
                FROM
                (
                    select startdate, PARTICIPANT_1, PARTICIPANT_2 
                    from team_table              
                )
                GROUP BY PARTICIPANT_1 
                HAVING COUNT(distinct PARTICIPANT_2) > 1
               )

For an example table:

startdate PARTICIPANT_1 PARTICIPANT_2
1-1-19      A               B
1-1-19      A               C
1-1-19      C               D
1-1-19      Q               R
1-1-19      S               T
1-1-19      U               V

should yield the following since A and C are the repeated elements

startdate PARTICIPANT_1 PARTICIPANT_2
1-1-19      A               B
1-1-19      A               C
1-1-19      C               D

I think this is what you need:

SELECT * FROM team_table t1
WHERE exists (SELECT 1 from team_table t2
               WHERE t1.startdate = t2.startdate -- don't know if you need this
                 -- Get all rows with duplicate values:
                 AND (t2.PARTICIPANT_1 IN (t1.PARTICIPANT_1, t1.PARTICIPANT_2)
                   OR t2.PARTICIPANT_2 IN (t1.PARTICIPANT_1, t1.PARTICIPANT_2))
                 -- Exclude the record itself:
                 AND (t1.PARTICIPANT_1 != t2.PARTICIPANT_1
                   OR t1.PARTICIPANT_2 != t2.PARTICIPANT_2))

If you have a unique id column, you can use:

select tt.*
from team_table tt
where exists (select 1
              from team_table tt2
              where (tt.participant_1 in (tt2.participant_1, tt2.participant_2) or
                     tt.participant_2 in (tt2.participant_1, tt2.participant_2)
                    ) and
                    tt2.id <> tt.id
             );

If you don't have one, you can actually generate one:

with tt as (
      select tt.*,
             row_number() over (partition by participant_1, participant_2, start_date) as seqnum
      from test_table tt
     )
select tt.*
from team_table tt
where exists (select 1
              from team_table tt2
              where (tt.participant_1 in (tt2.participant_1, tt2.participant_2) or
                     tt.participant_2 in (tt2.participant_1, tt2.participant_2)
                    ) and
                    tt2.seqnum <> tt.seqnum
             );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM