简体   繁体   中英

Which msql query is faster : multiple selects or join?

Need to decide which query is faster:

Query1

select xyz.user, count(*) as score
from xyz join
     (select qid, min(time) as mintime
      from xyz
      group by qid
     ) q
     on xyz.qid = q.qid and xyz.time = q.mintime
group by xyz.user;

Query2

select user,count(*) 
  from (select * 
          from (select * 
                  from xyz 
                 order by time ASC
               ) as temp1 
         group by temp1.qid
        ) As temp2
  group by temp2.user

both return the number of times that a user is "first" for each qid.

DB:

CREATE TABLE xyz (
id INT PRIMARY KEY AUTO_INCREMENT,
  user VARCHAR(20),
  time INT,
  qid INT
);

INSERT INTO xyz VALUES (    1  ,  'abc' ,  15  , 1);
INSERT INTO xyz VALUES (    2  ,  'abc' ,  6  , 1);
INSERT INTO xyz VALUES (    3  ,  'xyz' ,  11  , 1);
INSERT INTO xyz VALUES (    4  ,  'abc' ,  4  , 1);
INSERT INTO xyz VALUES (    5  ,  'xyz' ,  13  , 2);
INSERT INTO xyz VALUES (    6  ,  'abc' ,  11  ,2);
INSERT INTO xyz VALUES (    7  ,  'abc' ,  9  , 3);
INSERT INTO xyz VALUES (    8  ,  'xyz' ,  10  , 3);
INSERT INTO xyz VALUES (    9  ,  'xyz' ,  2  , 3);
INSERT INTO xyz VALUES (    10  ,  'xyz' ,  2  , 4);

Edit: As suggested replaced better with faster.

Your first query contains this subquery.

 select qid, min(time) as mintime
   from xyz
  group by qid

It can be made very efficient by a compound index on (qid,time) . MySQL will satisfy that query with a so-called loose index scan. So if I were you I would go with your first query.

But you should use EXPLAIN to make sure I am right. You should revisit this question sometime in the future when your tables have 100 times the number of rows they have now.

Before anything else you should add indexes to your table on the columns you use for search or comparison:

ALTER TABLE `xyz`
ADD INDEX (`qid`),
ADD INDEX (`time`);

Then put EXPLAIN in front of each query and check the results against the recommendations you can find in the manual. Look closely the sections about "Join Types" and "Extra Information" .

With or without indexes, avoid the second query. It is the worst and it cannot be improved.

I propose you an alternative query that produces the same results and is even faster than your first approach:

SELECT uif.user, COUNT(*) AS score
FROM xyz uif                  # "uif" from "user is first"
  LEFT JOIN xyz sm            # "sm" from "smaller time"
    ON uif.qid = sm.qid AND sm.time < uif.time
WHERE sm.time IS NULL         # keep only when there is no "smaller time"
GROUP BY uif.user

It joins the xyz table (aliased as uif ) against itself (aliased as sm ). Each row from uif is paired with all the rows from sm that have the same qid and smaller times ( sm.time < uif.time ). The LEFT JOIN ensures that all the rows from uif will appear in the joined set. When a row from uif does not have a pair from sm (because there is no row in sm having a smaller time), the columns from sm are filled with NULL .

The WHERE condition keeps only these rows from uif that have no pair in sm (there is no "smaller time"); that means it keeps only the rows from uif that have the smaller times.

The SELECT and GROUP BY clauses take care of your original goal (return the number of times). Always put only columns from uid in the SELECT clauses (because those of sm are all NULL , of course).

This is a typical performance scenario regarding speed. As for any such circumstance you have to test and measure.

That in itself is a quite tricky task since your queries will be cached by the db engine after the first run and the following executions will be considerably faster. The test and measure of performance in a system needs you to consider a lot of variables to be done correctly.

On the other side:

My estimation is that the query with the JOIN would be the fastest. Modern sql DB engines are good at optimization and a JOIN would allow the engine to freely optimize the query.

SELECT user, count(*) AS score
FROM xyz JOIN
     (SELECT qid, min(time) AS mintime
      FROM xyz
      GROUP by qid
     ) q
     ON qid = q.qid AND time = q.mintime
GROUP BY user;

I believe xyz.gid and xyz.time is not necessary

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM