简体   繁体   中英

MySQL table self-join returns too many rows

So I have a table, my_table with a primary key, id ( INT ), and further columns foo ( VARCHAR ) and bar ( DOUBLE ). Each foo should appear once in my table, with an associated bar value, but I know that I have several rows with identical foo s associated different bar s. How do I get a list of those rows containing the same foo value, but which have different bar s (say, different by more than 10.)? I tried:

SELECT t1.id, t1.bar, t2.id, t2.bar, t1.foo
    FROM my_table t1, my_table t2
    WHERE t1.foo=t2.foo
    AND t1.bar - t2.bar > 10.;

But I get lots and lots of results (more than the total number of rows in my_table ). I feel I must be doing something very obviously stupid, but can't see my mistake.

Ah - thanks SWeko: I think I understand why I'm getting so many results, then. Is there a way in SQL of counting, for each foo , the number of rows with that foo but bar s differing by more than 10.?

To answer your latest question:

Is there a way in SQL of counting, for each foo, the number of rows with that foo but bars differing by more than 10.?

A query like this should work:

select t1.id, t1.foo, t1.bar, count(t2.id) as dupes
from my_table t1
  left outer join my_table t2 on t1.foo=t2.foo and (t1.bar - t2.bar) > 10
group by t1.id, t1.foo, t1.bar; 

If, for example, you have 5 rows with foo='A' and 10 rows with foo='B' the self-join will join each A-row with each other A-row (including itself) and each B-row with each other B-row, so a simple

SELECT t1.id, t1.bar, t2.id, t2.bar, t1.foo
FROM my_table t1, my_table t2
WHERE t1.foo=t2.foo

will return 5*5+10*10=125 rows. Filtering the values will cut that number down, but you might still have (significantly) more rows than you started with. Eg if we presume that the B-rows have values of bar of 5 through 50 respectively, that would mean that they will be matched with:

bar = 5  - 0 rows that have bar less than -5
bar = 10 - 0 rows that have bar less than 0
bar = 15 - 0 rows that have bar less than 5
bar = 20 - 1 rows that have bar less than 10
bar = 25 - 2 rows that have bar less than 15
bar = 30 - 3 rows that have bar less than 20
bar = 35 - 4 rows that have bar less than 25
bar = 40 - 5 rows that have bar less than 30
bar = 45 - 6 rows that have bar less than 35
bar = 50 - 7 rows that have bar less than 40

so you will have 28 results for the B-rows alone, and that number rises with the square of the rows that have the same value of foo .

Have you tried the same thing with the "new" JOIN syntax?

    SELECT t1.*,
           t2.*
      FROM my_table t1
      JOIN my_table t2 ON t1.foo = t2.foo
     WHERE (t1.bar - t2.bar) > 10

I don't suspect that that will fix your problem, but for me that's at least where I would start.

I might also try this:

    SELECT t1.*,
           t2.*
      FROM my_table t1
      JOIN my_table t2 ON t1.foo = t2.foo AND t1.id != t2.id
     WHERE (t1.bar - t2.bar) > 10

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM