简体   繁体   English

我如何选择其中某些字段重复但其他字段在MySQL中进行比较的行

[英]How do i select rows where some fields are duplicated but other fields are compared in mySql

I have 1 table. 我有一张桌子。 4 fields. 4个领域。 How do I find the records that have duplicate values in Species AND they have duplicate values in Location , and Foo is compared to Bar in other records? 如何在“ Species找到具有重复值且在“ Location中具有重复值的记录,并在其他记录中将FooBar进行比较? (I'm looking for Foo less than Bar ) (我在寻找Foo而不是Bar

RecordId    Species   Location    Foo    Bar
1           Cat       home        4      9
2           Dog       home        4      9
3           Cat       home        3      7
4           Bunny     home        4      9
5           Cat       home        1      2

I want to find the records 1 and 3. Both have Cat (in Species ) AND home (in Location ) and Foo in record 1 is 4 which is less than Bar in record 3 (which is 7 ). 我想找到记录1和3。它们都具有Cat(在Species )和home(在Location ),并且Foo在记录1中的值为4 ,小于在记录3中的Bar (即7 )。 Record 5 doesn't match because Foo in Record 1 is not less than Bar in record 5. 记录5不匹配,因为记录1中的Foo不小于记录5中的Bar

If I have not worded the question properly please don't just close the it. 如果我没有正确说出问题,请不要仅仅关闭它。 I am happy to edit if need be. 如果需要,我很乐意进行编辑。

Let's break it down: 让我们分解一下:

Part 1: Records that share duplicate values in Field1 and Field2 : 第1部分:在Field1Field2中共享重复值的记录:

This is straightforward enough. 这很简单。 The query below returns all of the (Species + Location) tuples which appear more than once in the source table. 下面的查询返回所有(Species + Location)元组,它们在源表中多次出现。

SELECT
    Species,
    Location
FROM
    table
GROUP BY
    Species,
    Location
HAVING
    COUNT(*) > 1

This gives these results: 这给出了以下结果:

Species   Location
Cat       home

Then we want to get the original raw records (non-grouped) that have these known-duplicate values, we do that by doing an INNER JOIN with this back on the original table: 然后,我们要获取具有这些已知重复值的原始原始记录(非分组记录),可以通过在原始表上进行一次INNER JOIN来做到这一点:

SELECT
    table.*
FROM
    table
    INNER JOIN
    (
        SELECT
            Species,
            Location
        FROM
            table
        GROUP BY
            Species,
            Location
        HAVING
            COUNT(*) > 1
    ) AS duplicates ON
        table.Species  = duplicates.Species AND
        table.Location = duplicates.Location

(It can be tempting to have it as a WHERE subquery, but that is much less flexible and a less "relational" way of thinking about the problem) (将它作为WHERE子查询可能很诱人,但是灵活性较差,并且对问题的思考方式较少“关系”式)

This then gives these results: 然后得出以下结果:

RecordId    Species   Location  Foo    Bar
1           Cat       home      4      9
3           Cat       home      3      7
5           Cat       home      1      2

Part 2: Filter based on Foo and Bar : 第2部分:基于FooBar过滤器:

This is more complicated... here's the rules you gave: 这更加复杂...这是您给的规则:

  • Record 1: Included because the record's Foo == 4 is less than Record 3's Bar == 7 . 记录1:之所以包含,是因为记录的Foo == 4小于记录3的Bar == 7
  • Record 3: You don't explain why Record 3 is included besides it meeting the duplicate criteria - or why record 3's Bar = 7 is used to compare with Record 1 when Record 1's Bar = 9 is higher. 记录3:您不解释为什么除了满足重复条件的记录之外还包括记录3-或为什么当记录1的Bar = 9更高时,为什么使用记录3的Bar = 7与记录1进行比较。
  • Record 5: Excluded because the record 1's Foo == 4 is greater than record 5's Bar == 2 . 记录5:之所以排除,是因为记录1的Foo == 4大于记录5的Bar == 2

My interpretation is that inside each group (records 1, 3 and 5 in this case) only return records where Bar > MAX( Foo ) . 我的解释是,每个组内部(在这种情况下为记录1、3和5)仅返回Bar > MAX( Foo ) In this case, MAX( Foo ) == 4 , so records 1 and 3 are included because 9 > 4 and 7 > 4 respectively, but 5 is not because 2 > 4 is false. 在这种情况下, MAX( Foo ) == 4 ,所以分别包含记录1和3是因为9 > 47 > 4 ,但不是5并不是因为2 > 4为假。

We'll take the groups from the earlier query and add a MAX aggregate: 我们将从先前的查询中获取组,并添加一个MAX聚合:

SELECT
    Species,
    Location,
    MAX( Foo ) AS MaxFoo
FROM
    table
GROUP BY
    Species,
    Location
HAVING
    COUNT(*) > 1

This gives these results: 这给出了以下结果:

Species   Location    MaxFoo
Cat       home        4

(Because this query is a superset of the original subquery we don't need to JOIN on a second query but we can edit it in-place): (因为此查询是原始子查询的超集,所以我们不需要在第二个查询上进行JOIN ,但可以就地对其进行编辑):

SELECT
    table.*
FROM
    table
    INNER JOIN
    (
        SELECT
            Species,
            Location,
            MAX( Foo ) AS MaxFoo
        FROM
            table
        GROUP BY
            Species,
            Location
        HAVING
            COUNT(*) > 1
    ) AS duplicates ON
        table.Species  = duplicates.Species AND
        table.Location = duplicates.Location
WHERE
    table.Bar > duplicates.MaxFoo

And this query gives you your desired results: 并且此查询为您提供所需的结果:

RecordId    Species   Location  Foo    Bar
1           Cat       home      4      9
3           Cat       home      3      7

This query also shows the advantage of JOIN operations on subqueries instead of WHERE subqueries because you can perform more operations on the data (eg if you wanted to include MaxFoo in the output then just change SELECT table.* FROM... to SELECT table.*, duplicates.MaxFoo FROM... ). 此查询还显示了MaxFoo查询(而不是WHERE子查询)执行JOIN操作的优势,因为您可以对数据执行更多操作(例如,如果要在输出中包括MaxFoo ,则只需将SELECT table.* FROM...更改为SELECT table.*, duplicates.MaxFoo FROM... )。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM