简体   繁体   中英

How do i select rows where some fields are duplicated but other fields are compared in mySql

I have 1 table. 4 fields. How do I find the records that have duplicate values in Species AND they have duplicate values in Location , and Foo is compared to Bar in other records? (I'm looking for Foo less than Bar )

RecordId    Species   Location    Foo    Bar
1           Cat       home        4      9
2           Dog       home        4      9
3           Cat       home        3      7
4           Bunny     home        4      9
5           Cat       home        1      2

I want to find the records 1 and 3. Both have Cat (in Species ) AND home (in Location ) and Foo in record 1 is 4 which is less than Bar in record 3 (which is 7 ). Record 5 doesn't match because Foo in Record 1 is not less than Bar in record 5.

If I have not worded the question properly please don't just close the it. I am happy to edit if need be.

Let's break it down:

Part 1: Records that share duplicate values in Field1 and Field2 :

This is straightforward enough. The query below returns all of the (Species + Location) tuples which appear more than once in the source table.

SELECT
    Species,
    Location
FROM
    table
GROUP BY
    Species,
    Location
HAVING
    COUNT(*) > 1

This gives these results:

Species   Location
Cat       home

Then we want to get the original raw records (non-grouped) that have these known-duplicate values, we do that by doing an INNER JOIN with this back on the original table:

SELECT
    table.*
FROM
    table
    INNER JOIN
    (
        SELECT
            Species,
            Location
        FROM
            table
        GROUP BY
            Species,
            Location
        HAVING
            COUNT(*) > 1
    ) AS duplicates ON
        table.Species  = duplicates.Species AND
        table.Location = duplicates.Location

(It can be tempting to have it as a WHERE subquery, but that is much less flexible and a less "relational" way of thinking about the problem)

This then gives these results:

RecordId    Species   Location  Foo    Bar
1           Cat       home      4      9
3           Cat       home      3      7
5           Cat       home      1      2

Part 2: Filter based on Foo and Bar :

This is more complicated... here's the rules you gave:

  • Record 1: Included because the record's Foo == 4 is less than Record 3's Bar == 7 .
  • Record 3: You don't explain why Record 3 is included besides it meeting the duplicate criteria - or why record 3's Bar = 7 is used to compare with Record 1 when Record 1's Bar = 9 is higher.
  • Record 5: Excluded because the record 1's Foo == 4 is greater than record 5's Bar == 2 .

My interpretation is that inside each group (records 1, 3 and 5 in this case) only return records where Bar > MAX( Foo ) . In this case, MAX( Foo ) == 4 , so records 1 and 3 are included because 9 > 4 and 7 > 4 respectively, but 5 is not because 2 > 4 is false.

We'll take the groups from the earlier query and add a MAX aggregate:

SELECT
    Species,
    Location,
    MAX( Foo ) AS MaxFoo
FROM
    table
GROUP BY
    Species,
    Location
HAVING
    COUNT(*) > 1

This gives these results:

Species   Location    MaxFoo
Cat       home        4

(Because this query is a superset of the original subquery we don't need to JOIN on a second query but we can edit it in-place):

SELECT
    table.*
FROM
    table
    INNER JOIN
    (
        SELECT
            Species,
            Location,
            MAX( Foo ) AS MaxFoo
        FROM
            table
        GROUP BY
            Species,
            Location
        HAVING
            COUNT(*) > 1
    ) AS duplicates ON
        table.Species  = duplicates.Species AND
        table.Location = duplicates.Location
WHERE
    table.Bar > duplicates.MaxFoo

And this query gives you your desired results:

RecordId    Species   Location  Foo    Bar
1           Cat       home      4      9
3           Cat       home      3      7

This query also shows the advantage of JOIN operations on subqueries instead of WHERE subqueries because you can perform more operations on the data (eg if you wanted to include MaxFoo in the output then just change SELECT table.* FROM... to SELECT table.*, duplicates.MaxFoo FROM... ).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM