简体   繁体   中英

SQL filter rows without join

I'm always "irk" by unnecessary join. But in this case, I wonder if it's possible to not use join.

This is an example of the table I have:

id | team | score
 1 |   1  |  300
 2 |   1  |  257
 3 |   2  |  127
 4 |   2  |  533
 5 |   3  |  459

This is what I want:

team | score | id
  1  |  300  |  1
  2  |  533  |  4
  3  |  459  |  5

Doing a query looking like this: (basically: who's the best player of each team)

SELECT team, MAX(score) AS score, id
FROM my_table
GROUP BY team

But I get something like that:

team | score | id
  1  |  300  |  1
  2  |  533  |  3
  3  |  459  |  5

But it's not the third player that got 533 points, so the result have no consistency.

Is it possible to get truthworthy results without joining the table with itself? How to achieve that?

You can use variables:

SELECT id, team, score
FROM (
  SELECT id, team, score,
         @seq := IF(@t = team, @seq, 
                   IF(@t := team, @seq + 1, @seq + 1)) AS seq,
         @grp := IF(@t2 = team, @grp + 1, 
                   IF(@t2 := team, 1, 1)) AS grp                 
  FROM mytable
  CROSS JOIN (SELECT @seq := 0, @t := 0, @grp := 0, @t2 := 0) AS vars
  ORDER BY score DESC) AS t
WHERE seq <= 3 AND grp = 1

Variable @seq is incremented each time a new team is met as the records are being processed in descending score order. Variable @grp is used to enumerate records within each team partition. Records with @grp = 1 are the ones having the greatest score value within the team slice.

Demo here

You can do it without joins by using subquery like this:

SELECT id, team, score
FROM table1 a
WHERE score = (SELECT MAX(score) FROM table1 b WHERE a.team = b.team);

However in big tables this can be very slow as you have to run the whole subquery for every row in your table.

However there's nothing wrong with using join to filter results like this:

SELECT id, team, score FROM table1 a
INNER JOIN (
    SELECT MAX(score) score, team
    FROM table1
    GROUP BY team
    ) b ON a.score = b.score AND a.team = b.team

Although joining itself is quite expensive, this way you only have to run two actual queries regardless how many rows are in your tables. So in big tables this method can still be hundreds, if not thousands of times faster than the first method with subquery.

Unfortantly , MySQL doesn't support window functions like ROW_NUMBER() which could have solved this easily.

There are several ways on doing that:

NOT EXISTS() :

SELECT * FROM YourTable t
WHERE NOT EXISTS(SELECT 1 FROM YourTable s
                 WHERE t.team = s.team AND s.score > t.score)

NOT IN() :

SELECT * FROM YourTable t
WHERE (t.team,t.score) IN(SELECT s.team,MAX(s.score)
                          FROM YourTable s
                          GROUP BY s.team)

A correlated query:

SELECT distinct t.id,t.team,
       (SELECT s.score FROM YourTable s
        WHERE s.team = t.team
        ORDER BY s.score DESC
        LIMIT 1)
FROM YourTable t

Or a join which I understand you already have.

EDIT : I take my words back, you can do it with a variable like @GiorgosBetsos solution.

You could do something like this:

SELECT team, score, id
FROM (SELECT *
             ,RANK() OVER 
             (PARTITION BY team ORDER BY score DESC) AS Rank
      FROM my_table) ranked_result
WHERE Rank = 1;

Some info on Rank functionality: Clicketyclickclick

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM