How to optimize select query with case statements?

Question

I have 3 tables over 1,000,000+ records. My select query is running for hours. How to optimize it? I'm newbie.

I tried to add index for name , still it taking hours to load.

Like this,

ALTER TABLE table2 ADD INDEX(name);

and like this also,

CREATE INDEX INDEX1 table2(name);

SELECT MS.*, P.Counts FROM 
(SELECT M.*, 
TIMESTAMPDIFF(YEAR, M.date, CURDATE()) AS age,               
CASE V.name 
WHEN 'text' THEN  M.name 
WHEN V.name IS NULL THEN M.name 
ELSE V.name 
END col1  
FROM table1 M 
LEFT JOIN table2 V ON M.id=V.id) AS MS
LEFT JOIN 
(select E.id, count(E.id) Counts 
from table3 E
where E.field2 = 'value1' 
group by E.id) AS P
ON MS.id=P.id;

Explain <above query>;

output:

+----+-------------+------------+------------+-------+---------------------------------------------+------------------+---------+------------------------+---------+----------+-----------------------------------------------------------------+
| id | select_type | table      | partitions | type  | possible_keys                               | key              | key_len | ref                    | rows    | filtered | Extra                                                           |
+----+-------------+------------+------------+-------+---------------------------------------------+------------------+---------+------------------------+---------+----------+-----------------------------------------------------------------+
|  1 | PRIMARY     | M          | NULL       | ALL   | NULL                                        | NULL             | NULL    | NULL                   |  344763 |   100.00 | NULL                                                            |
|  1 | PRIMARY     | <derived3> | NULL       | ref   | <auto_key0>                                 | <auto_key0>      | 8       | CP.M.id |      10 |   100.00 | NULL                                                            |
|  1 | PRIMARY     | V          | NULL       | index | NULL                                        | INDEX1           | 411     | NULL                   | 1411083 |   100.00 | Using where; Using index; Using join buffer (Block Nested Loop) |
|  3 | DERIVED     | E          | NULL       | ref   | PRIMARY,f2,f3                 | f2| 43      | const                  |  966442 |   100.00 | Using index                                                     |
+----+-------------+------------+------------+-------+---------------------------------------------+------------------+---------+------------------------+---------+----------+-----------------------------------------------------------------+

I expect to get result in less than 1 min.

The query indented for clarity.

SELECT MS.*, P.Counts
  FROM  (
           SELECT M.*, 
                  TIMESTAMPDIFF(YEAR, M.date, CURDATE()) AS age,               
             CASE V.name 
                  WHEN 'text' THEN  M.name 
                  WHEN V.name IS NULL THEN M.name 
                  ELSE V.name 
                  END col1  
             FROM table1 M 
             LEFT JOIN table2 V ON M.id=V.id
      ) AS MS
  LEFT JOIN ( 
                  select E.id, count(E.id) Counts 
                   from table3 E
                   where E.field2 = 'value1' 
                   group by E.id
    ) AS P ON MS.id=P.id;

Answer 1

Your query has no filtering predicate, so it's essentially retrieving all the rows. That is a 1,000,000+ rows from table1 . Then it's joining it with table2 , and then with another table expression/derived table.

Why do you expect this query to be fast? A massive query like this one will normally run as a batch process at night. I assume this query is not for an online process, right?

Maybe you need to rethink the process. Do you really need to process millions of rows at once interactively? Will the user read a million rows in the web page?

Answer 2

For starters, you are returning the same result for 'col1' in case v.name is null or v.name != 'text'. That said, you can include that extra condition on you join with table2 and use IFNULL function.

Has you are filtering table3 by field2, you could probably create an index over table 3 that includes field2.

You should also check if you can include any additional filter for any of those tables, and if you do you can consider using a stored procedure to get the results.

Also, I don´t see why you need to the aggregate the first join into 'MS' you can easy do all the joins in one go like this:

SELECT 
    M.*, 
    TIMESTAMPDIFF(YEAR, M.date, CURDATE()) AS age,               
    IFNULL(V.name, M.name) as col1,
    P.Counts 
FROM table1 M 
LEFT JOIN table2 V ON M.id=V.id AND V.name <> 'text'
LEFT JOIN 
(SELECT 
    E.id, 
    COUNT(E.id) Counts 
FROM table3 E
WHERE E.field2 = 'value1' 
GROUP BY E.id) AS P ON M.id=P.id;

I'm also assuming that you do have clustered indexes for all id fields in all this three tables, but with no filter, if you are dealing with millions off records, this will always be an big heavy query. To say the least your are doing a table scan for table1.

I've included this additional information after you comment.

I've mentioned clustered index, but according to the official documentation about indexeshere

When you define a PRIMARY KEY on your table, InnoDB uses it as the clustered index. So if you already have a primary key defined you don't need to do anything else. Has the documentation also point's out you should define a primary key for each table that you create.

If you don't have a primary key. Here is the code snippet you requested.

ALTER TABLE table1 ADD CONSTRAINT pk_table1
 PRIMARY KEY CLUSTERED (id);

ATTENTION: Keep in mind that creating a clustered index is a big operation, for tables like yours with tones of data. This isn't something you want to do without planning, on a production server. This operation will also take a long time and table will be locked during the process.

Answer 3

Subqueries are not always well-optimized.

I think you can flatten it out something like:

SELECT  M.*, V.*,
        TIMESTAMPDIFF(YEAR, M.date, CURDATE()) AS age,
        CASE V.name WHEN 'text'          THEN M.name
                    WHEN V.name IS NULL  THEN M.name
                                         ELSE V.name  END col1,
        ( SELECT COUNT(*) FROM table3 WHERE field2 = 'value1' AND id = x.id
        ) AS Counts
    FROM table1 AS M
    LEFT JOIN table2 AS V  ON M.id = V.id

I may have some parts not quite right; see if you can make this formulation work.

How to optimize select query with case statements?

Question

3 answers

solution1
1 2019-06-20 20:46:32

solution2
0 2019-06-20 19:49:17

solution3
0 2019-06-20 20:38:57

How to optimize select query with case statements?

Question

3 answers

solution1 1 2019-06-20 20:46:32

solution2 0 2019-06-20 19:49:17

solution3 0 2019-06-20 20:38:57

solution1
1 2019-06-20 20:46:32

solution2
0 2019-06-20 19:49:17

solution3
0 2019-06-20 20:38:57