简体   繁体   中英

FIND_IN_SET too slow with GROUP_CONCAT (Dense Rank in MySQL)

I have a query that calculates dense ranks based on the value of a column :

SELECT id,
       score1, 
       FIND_IN_SET
       ( 
         score1, 
          ( 
            SELECT GROUP_CONCAT(score1  ORDER BY score1  DESC) FROM scores 
          ) 
       ) as rank 
FROM score_news;

This is what the query results look like:

+----+--------+------+
| id | score1 | rank |
+----+--------+------+
|  1 |     15 |    1 |
|  2 |     15 |    1 |
|  3 |     14 |    3 |
|  4 |     13 |    4 |
+----+--------+------+

The query takes Nx longer time when number of scores increases by N times. Is there any way I can optimize this ? My table size in the order of 10 6

NOTE: I have already tried a technique using mysql user variables but I get inconsistent results when I run it on a large set. On investigation I found this in the MySQL docs:

The order of evaluation for user variables is undefined and may change based on the elements contained within a given query. In SELECT @a, @a := @a+1 ..., you might think that MySQL will evaluate @a first and then do an assignment second, but changing the query (for example, by adding a GROUP BY, HAVING, or ORDER BY clause) may change the order of evaluation...The general rule is never to assign a value to a user variable in one part of a statement and use the same variable in some other part of the same statement. You might get the results you expect, but this is not guaranteed.

My attempt with user variables :

SELECT
      a.id,
      @prev := @curr as prev,
      @curr := a.score1 as curr,
      @rank := IF(@rank = 0, @rank + 1, IF(@prev > @curr, @rank+@ties, @rank)) AS rank,
      @ties := IF(@prev = @curr, @ties+1, 1) AS ties
    FROM
      scores a,
      (
        SELECT
          @curr := null,
          @prev := null,
          @rank := 0,
          @ties := 1,
          @total := count(*) 
        FROM scores 
        WHERE score1 is not null 
      ) b
    WHERE
      score1 is not null 
    ORDER BY
      score1 DESC
   ) 

The solution with variables could work, but you need to first order the result set, and only then work with the variable assignments:

SELECT a.id,
       @rank := IF(@curr = a.score1, @rank, @rank + @ties) AS rank,
       @ties := IF(@curr = a.score1, @ties + 1, 1) AS ties,
       @curr := a.score1 AS curr
FROM   (SELECT * FROM scores WHERE score1 is NOT NULL ORDER BY score1 DESC) a,
       (SELECT @curr := null, @rank := 0, @ties := 1) b

NB: I placed the curr column last in the select clause to save one variable.

You can also use following query to get your dense rank without using user defined variables

SELECT 
  a.*,
  (SELECT 
    COUNT(DISTINCT score1) 
  FROM
    scores b 
  WHERE a.`score1` < b.score1) + 1 rank 
FROM
  scores a 
ORDER BY score1 DESC

Demo

Demo using your data set

An index on score1 column might help you

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM