I have 1 table. 4 fields. How do I find the records that have duplicate values in Species
AND they have duplicate values in Location
, and Foo
is compared to Bar
in other records? (I'm looking for Foo
less than Bar
)
RecordId Species Location Foo Bar
1 Cat home 4 9
2 Dog home 4 9
3 Cat home 3 7
4 Bunny home 4 9
5 Cat home 1 2
I want to find the records 1 and 3. Both have Cat (in Species
) AND home (in Location
) and Foo
in record 1 is 4
which is less than Bar
in record 3 (which is 7
). Record 5 doesn't match because Foo
in Record 1 is not less than Bar
in record 5.
If I have not worded the question properly please don't just close the it. I am happy to edit if need be.
Let's break it down:
Field1
and Field2
: This is straightforward enough. The query below returns all of the (Species + Location)
tuples which appear more than once in the source table.
SELECT
Species,
Location
FROM
table
GROUP BY
Species,
Location
HAVING
COUNT(*) > 1
This gives these results:
Species Location
Cat home
Then we want to get the original raw records (non-grouped) that have these known-duplicate values, we do that by doing an INNER JOIN
with this back on the original table:
SELECT
table.*
FROM
table
INNER JOIN
(
SELECT
Species,
Location
FROM
table
GROUP BY
Species,
Location
HAVING
COUNT(*) > 1
) AS duplicates ON
table.Species = duplicates.Species AND
table.Location = duplicates.Location
(It can be tempting to have it as a WHERE
subquery, but that is much less flexible and a less "relational" way of thinking about the problem)
This then gives these results:
RecordId Species Location Foo Bar
1 Cat home 4 9
3 Cat home 3 7
5 Cat home 1 2
Foo
and Bar
: This is more complicated... here's the rules you gave:
Foo == 4
is less than Record 3's Bar == 7
. Bar = 7
is used to compare with Record 1 when Record 1's Bar = 9
is higher. Foo == 4
is greater than record 5's Bar == 2
. My interpretation is that inside each group (records 1, 3 and 5 in this case) only return records where Bar > MAX( Foo )
. In this case, MAX( Foo ) == 4
, so records 1 and 3 are included because 9 > 4
and 7 > 4
respectively, but 5 is not because 2 > 4
is false.
We'll take the groups from the earlier query and add a MAX
aggregate:
SELECT
Species,
Location,
MAX( Foo ) AS MaxFoo
FROM
table
GROUP BY
Species,
Location
HAVING
COUNT(*) > 1
This gives these results:
Species Location MaxFoo
Cat home 4
(Because this query is a superset of the original subquery we don't need to JOIN
on a second query but we can edit it in-place):
SELECT
table.*
FROM
table
INNER JOIN
(
SELECT
Species,
Location,
MAX( Foo ) AS MaxFoo
FROM
table
GROUP BY
Species,
Location
HAVING
COUNT(*) > 1
) AS duplicates ON
table.Species = duplicates.Species AND
table.Location = duplicates.Location
WHERE
table.Bar > duplicates.MaxFoo
And this query gives you your desired results:
RecordId Species Location Foo Bar
1 Cat home 4 9
3 Cat home 3 7
This query also shows the advantage of JOIN
operations on subqueries instead of WHERE
subqueries because you can perform more operations on the data (eg if you wanted to include MaxFoo
in the output then just change SELECT table.* FROM...
to SELECT table.*, duplicates.MaxFoo FROM...
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.