[英]How do i select rows where some fields are duplicated but other fields are compared in mySql
I have 1 table. 我有一张桌子。 4 fields. 4个领域。 How do I find the records that have duplicate values in Species
AND they have duplicate values in Location
, and Foo
is compared to Bar
in other records? 如何在“ Species
找到具有重复值且在“ Location
中具有重复值的记录,并在其他记录中将Foo
与Bar
进行比较? (I'm looking for Foo
less than Bar
) (我在寻找Foo
而不是Bar
)
RecordId Species Location Foo Bar
1 Cat home 4 9
2 Dog home 4 9
3 Cat home 3 7
4 Bunny home 4 9
5 Cat home 1 2
I want to find the records 1 and 3. Both have Cat (in Species
) AND home (in Location
) and Foo
in record 1 is 4
which is less than Bar
in record 3 (which is 7
). 我想找到记录1和3。它们都具有Cat(在Species
)和home(在Location
),并且Foo
在记录1中的值为4
,小于在记录3中的Bar
(即7
)。 Record 5 doesn't match because Foo
in Record 1 is not less than Bar
in record 5. 记录5不匹配,因为记录1中的Foo
不小于记录5中的Bar
。
If I have not worded the question properly please don't just close the it. 如果我没有正确说出问题,请不要仅仅关闭它。 I am happy to edit if need be. 如果需要,我很乐意进行编辑。
Let's break it down: 让我们分解一下:
Field1
and Field2
: 第1部分:在Field1
和Field2
中共享重复值的记录: This is straightforward enough. 这很简单。 The query below returns all of the (Species + Location)
tuples which appear more than once in the source table. 下面的查询返回所有(Species + Location)
元组,它们在源表中多次出现。
SELECT
Species,
Location
FROM
table
GROUP BY
Species,
Location
HAVING
COUNT(*) > 1
This gives these results: 这给出了以下结果:
Species Location
Cat home
Then we want to get the original raw records (non-grouped) that have these known-duplicate values, we do that by doing an INNER JOIN
with this back on the original table: 然后,我们要获取具有这些已知重复值的原始原始记录(非分组记录),可以通过在原始表上进行一次INNER JOIN
来做到这一点:
SELECT
table.*
FROM
table
INNER JOIN
(
SELECT
Species,
Location
FROM
table
GROUP BY
Species,
Location
HAVING
COUNT(*) > 1
) AS duplicates ON
table.Species = duplicates.Species AND
table.Location = duplicates.Location
(It can be tempting to have it as a WHERE
subquery, but that is much less flexible and a less "relational" way of thinking about the problem) (将它作为WHERE
子查询可能很诱人,但是灵活性较差,并且对问题的思考方式较少“关系”式)
This then gives these results: 然后得出以下结果:
RecordId Species Location Foo Bar
1 Cat home 4 9
3 Cat home 3 7
5 Cat home 1 2
Foo
and Bar
: 第2部分:基于Foo
和Bar
过滤器: This is more complicated... here's the rules you gave: 这更加复杂...这是您给的规则:
Foo == 4
is less than Record 3's Bar == 7
. 记录1:之所以包含,是因为记录的Foo == 4
小于记录3的Bar == 7
。 Bar = 7
is used to compare with Record 1 when Record 1's Bar = 9
is higher. 记录3:您不解释为什么除了满足重复条件的记录之外还包括记录3-或为什么当记录1的Bar = 9
更高时,为什么使用记录3的Bar = 7
与记录1进行比较。 Foo == 4
is greater than record 5's Bar == 2
. 记录5:之所以排除,是因为记录1的Foo == 4
大于记录5的Bar == 2
。 My interpretation is that inside each group (records 1, 3 and 5 in this case) only return records where Bar > MAX( Foo )
. 我的解释是,每个组内部(在这种情况下为记录1、3和5)仅返回Bar > MAX( Foo )
。 In this case, MAX( Foo ) == 4
, so records 1 and 3 are included because 9 > 4
and 7 > 4
respectively, but 5 is not because 2 > 4
is false. 在这种情况下, MAX( Foo ) == 4
,所以分别包含记录1和3是因为9 > 4
和7 > 4
,但不是5并不是因为2 > 4
为假。
We'll take the groups from the earlier query and add a MAX
aggregate: 我们将从先前的查询中获取组,并添加一个MAX
聚合:
SELECT
Species,
Location,
MAX( Foo ) AS MaxFoo
FROM
table
GROUP BY
Species,
Location
HAVING
COUNT(*) > 1
This gives these results: 这给出了以下结果:
Species Location MaxFoo
Cat home 4
(Because this query is a superset of the original subquery we don't need to JOIN
on a second query but we can edit it in-place): (因为此查询是原始子查询的超集,所以我们不需要在第二个查询上进行JOIN
,但可以就地对其进行编辑):
SELECT
table.*
FROM
table
INNER JOIN
(
SELECT
Species,
Location,
MAX( Foo ) AS MaxFoo
FROM
table
GROUP BY
Species,
Location
HAVING
COUNT(*) > 1
) AS duplicates ON
table.Species = duplicates.Species AND
table.Location = duplicates.Location
WHERE
table.Bar > duplicates.MaxFoo
And this query gives you your desired results: 并且此查询为您提供所需的结果:
RecordId Species Location Foo Bar
1 Cat home 4 9
3 Cat home 3 7
This query also shows the advantage of JOIN
operations on subqueries instead of WHERE
subqueries because you can perform more operations on the data (eg if you wanted to include MaxFoo
in the output then just change SELECT table.* FROM...
to SELECT table.*, duplicates.MaxFoo FROM...
). 此查询还显示了MaxFoo
查询(而不是WHERE
子查询)执行JOIN
操作的优势,因为您可以对数据执行更多操作(例如,如果要在输出中包括MaxFoo
,则只需将SELECT table.* FROM...
更改为SELECT table.*, duplicates.MaxFoo FROM...
)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.