简体   繁体   中英

Slow JOIN Query with OR in WHERE Clause - Missing Possible Indexes?

I'm trying to retrieve a paginated list and total number of "notifications" about a "case" that belongs to a specific user.

The notification has a few conditions being "not locked", "not private", "not already seen" and should be returned # found, and then ordered descending by date created.

The last condition being that the notification wasn't created by the user itself, or that the notification is of type "conduct" (enumeration) and the user_id is referenced in the notification "ref_id"

This query is taking over 5 seconds to run against 200k rows in recent_changes and less than 4k rows in cases and 50 users.

+-----+
| cnt |
+-----+
|  13 |
+-----+
1 row in set (4.67 sec)

Can this query be optimized on its own, or will it need restructuring?

SELECT count(*) as cnt
 FROM recent_changes rc 
 LEFT JOIN `case` c on c.id = rc.case_id 
 LEFT JOIN `user` u on u.id = rc.user_id
 WHERE 
 (
   rc.user_id != c.user_id AND c.user_id = '25'
   OR
   (rc.type = 'conduct' AND rc.ref_id = '25')
 )
 AND c.locked = 'N'  AND rc.private != 'Y' 
 AND seen = 'false'
 ORDER BY rc.datecreated DESC;

Explain output

+----+-------------+-------+--------+--------------------------+-------------------------+---------+--------------------------+------+------------------------------+
| id | select_type | table | type   | possible_keys            | key                     | key_len | ref                      | rows | Extra                        |
+----+-------------+-------+--------+--------------------------+-------------------------+---------+--------------------------+------+------------------------------+
|  1 | SIMPLE      | c     | ALL    | PRIMARY,user_user_id_idx | NULL                    | NULL    | NULL                     | 3699 | Using where; Using temporary |
|  1 | SIMPLE      | rc    | ref    | idx_recent_changes_case  | idx_recent_changes_case | 5       | xxxxxxxxxxxxx.c.id       |   25 | Using where                  |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY                  | PRIMARY                 | 4       | xxxxxxxxxxxxx.rc.user_id |    1 | Using index                  |
+----+-------------+-------+--------+--------------------------+-------------------------+---------+--------------------------+------+------------------------------+

Indexes on recent_changes:

+----------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| Table          | Non_unique | Key_name                     | Seq_in_index | Column_name | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+----------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+
| recent_changes |          0 | PRIMARY                      |            1 | id          | A         |      182807 |     NULL | NULL   |      | BTREE      |         |
| recent_changes |          1 | recent_changes_user_id_idx   |            1 | user_id     | A         |          96 |     NULL | NULL   | YES  | BTREE      |         |
| recent_changes |          1 | idx_recent_changes_user_case |            1 | user_id     | A         |          92 |     NULL | NULL   | YES  | BTREE      |         |
| recent_changes |          1 | idx_recent_changes_user_case |            2 | case_id     | A         |       18280 |     NULL | NULL   | YES  | BTREE      |         |
| recent_changes |          1 | idx_recent_changes_case      |            1 | case_id     | A         |        7312 |     NULL | NULL   | YES  | BTREE      |         |
+----------------+------------+------------------------------+--------------+-------------+-----------+-------------+----------+--------+------+------------+---------+

Indexes on case table:

+-------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| Table | Non_unique | Key_name         | Seq_in_index | Column_name         | Collation | Cardinality | Sub_part | Packed | Null | Index_type | Comment |
+-------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+
| case  |          0 | PRIMARY          |            1 | id                  | A         |        3753 |     NULL | NULL   |      | BTREE      |         |
| case  |          1 | id_idx           |            1 | member_id           | A         |        3753 |     NULL | NULL   | YES  | BTREE      |         |
| case  |          1 | user_user_id_idx |            1 | user_id             | A         |           2 |     NULL | NULL   | YES  | BTREE      |         |
| case  |          1 | case_ha_id       |            1 | health_authority_id | A         |          28 |     NULL | NULL   | YES  | BTREE      |         |
+-------+------------+------------------+--------------+---------------------+-----------+-------------+----------+--------+------+------------+---------+

It does the following in concept:

Find the most recent rows in recent_changes where:

i) the recent_changes row joins to a case table by case_id that is owned by this current user_id ii) AND the recent_changes row was not created by the current user_id

OR

i) the recent_changes row is of "conduct" type and the current user_id is in the recent_changes.ref_id column

If I remove the "OR (rc.type = 'conduct' AND rc.ref_id = '25')" condition then I get <1s response time.

If I remove the "rc.user_id != c.user_id AND c.user_id = '25' OR" condition it still takes about 5s to complete.


EDIT

Changing the join order shaved off 1/2 second, although I can't join case on rc .case_id until I've joined rc first: Unknown column 'rc.user_id' in 'where clause'.

New query:

SELECT count(*) as cnt
FROM `user` u 
LEFT JOIN `recent_changes` rc on u.id = rc.user_id 
LEFT JOIN `case` c on c.id = rc.case_id 
WHERE 
(
    rc.user_id != c.user_id AND c.user_id = '25'
    OR
    (rc.type = 'conduct' AND rc.ref_id = '25')
)
AND c.locked = 'N'  AND rc.private != 'Y' 
AND seen = 'false'
ORDER BY rc.datecreated DESC;

Removing the "ORDER BY" clause doesn't seem to increase the new join order query, although I am now better aware of it's performance impact.

Using UNION isn't any quicker but running each select separately has pointed out that the first SELECT only takes .3s where the second select is over 4s:

select count(*) as cnt
FROM (
SELECT count(*) FROM `user` u 
LEFT JOIN `recent_changes` rc on u.id = rc.user_id 
LEFT JOIN `case` c on c.id = rc.case_id 
WHERE rc.user_id != c.user_id AND c.user_id = '25'
AND c.locked = 'N'  AND rc.private != 'Y' 
AND seen = 'false'
UNION ALL
SELECT count(*) as cnt
FROM `user` u 
LEFT JOIN `recent_changes` rc on u.id = rc.user_id 
LEFT JOIN `case` c on c.id = rc.case_id 
WHERE rc.type = 'conduct' AND rc.ref_id = '25'
AND c.locked = 'N'  AND rc.private != 'Y' 
AND seen = 'false') x

I believe that the recent_changes rc table doesn't have the necessary index as per the EXPLAIN:

EXPLAIN SELECT count(*) FROM `user` u  LEFT JOIN `recent_changes` rc on u.id = rc.user_id  LEFT JOIN `case` c on c.id = rc.case_id  WHERE rc.user_id != c.user_id AND c.user_id = '25' AND c.locked = 'N'  AND rc.private != 'Y'  AND seen = 'false';

Runs in < .5s

+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
| id | select_type | table | type   | possible_keys                                                                   | key                     | key_len | ref                      | rows | Extra       |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
|  1 | SIMPLE      | c     | ref    | PRIMARY,user_user_id_idx                                                        | user_user_id_idx        | 5       | const                    |  383 | Using where |
|  1 | SIMPLE      | rc    | ref    | recent_changes_user_id_idx,idx_recent_changes_user_case,idx_recent_changes_case | idx_recent_changes_case | 5       | hsaedmp_jason.c.id       |   20 | Using where |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY                                                                         | PRIMARY                 | 4       | hsaedmp_jason.rc.user_id |    1 | Using index |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+

Runs in > 4s

EXPLAIN SELECT count(*) as cnt FROM `user` u  LEFT JOIN `recent_changes` rc on u.id = rc.user_id  LEFT JOIN `case` c on c.id = rc.case_id  WHERE rc.type = 'conduct' AND rc.ref_id = '25' AND c.locked = 'N'  AND rc.private != 'Y'  AND seen = 'false';

Key = NULL which is not good.

+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
| id | select_type | table | type   | possible_keys                                                                   | key                     | key_len | ref                      | rows | Extra       |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+
|  1 | SIMPLE      | c     | ALL    | PRIMARY                                                                         | NULL                    | NULL    | NULL                     | 3797 | Using where |
|  1 | SIMPLE      | rc    | ref    | recent_changes_user_id_idx,idx_recent_changes_user_case,idx_recent_changes_case | idx_recent_changes_case | 5       | hsaedmp_jason.c.id       |   20 | Using where |
|  1 | SIMPLE      | u     | eq_ref | PRIMARY                                                                         | PRIMARY                 | 4       | hsaedmp_jason.rc.user_id |    1 | Using index |
+----+-------------+-------+--------+---------------------------------------------------------------------------------+-------------------------+---------+--------------------------+------+-------------+

I'm confused that the EXPLAIN shows that the case table is not using a key, but it appears that the recent_changes table is the one that needs to have an INDEX on the ref_id column?

Here is the explain with that index, which looks much better here, but I haven't been able to test it on production yet.

+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------
---+------------------------+---------+--------------------------+------+----------+-------------+
| id | select_type | table | partitions | type   | possible_keys
   | key                    | key_len | ref                      | rows | filtered | Extra       |
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------
---+------------------------+---------+--------------------------+------+----------+-------------+
|  1 | SIMPLE      | rc    | NULL       | ref    | recent_changes_user_id_idx,idx_recent_changes_user_case,idx_recent_changes_case,idx_recent_changes_case_date,idx_recent_changes_r
ef | idx_recent_changes_ref | 5       | const                    | 2096 |     3.12 | Using where |
|  1 | SIMPLE      | u     | NULL       | eq_ref | PRIMARY
   | PRIMARY                | 4       | hsaedmp_jason.rc.user_id |    1 |   100.00 | Using index |
|  1 | SIMPLE      | c     | NULL       | eq_ref | PRIMARY
   | PRIMARY                | 4       | hsaedmp_jason.rc.case_id |    1 |    50.00 | Using where |
+----+-------------+-------+------------+--------+----------------------------------------------------------------------------------------------------------------------------------
---+------------------------+---------+--------------------------+------+----------+-------------+

UPDATE

I have reworked the query using a UNION statement, changing the JOIN order and by adding a compound index on recent_changes table together brings the query response time to <10ms.

Here is the new query using a UNION statement.

select count(*) as num
FROM (
(
SELECT rc1.*
FROM `user` u1 
LEFT JOIN `recent_changes` rc1 on u1.id = rc1.user_id 
LEFT JOIN `case` c1 on c1.id = rc1.case_id 
WHERE 
(rc1.user_id != c1.user_id AND c1.user_id = '1')
AND c1.locked = 'Y'
AND rc1.private != 'Y' 
AND seen = 'false'
ORDER BY rc1.datecreated DESC
)
UNION
(
SELECT rc.*
FROM `user` u 
LEFT JOIN `recent_changes` rc on u.id = rc.user_id 
LEFT JOIN `case` c on c.id = rc.case_id 
WHERE
(rc.type = 'conduct' AND rc.ref_id = '1')
AND c.locked = 'Y'
AND rc.private != 'Y' 
AND seen = 'false'
ORDER BY rc.datecreated DESC
)
) x;

And the index I created based on the final query I was needing.

ALTER TABLE recent_changes ADD INDEX idx_recent_changes_notification (type, ref_id, private, seen, user_id);

Thank for everyone's input!

Smaller table should be placed at the first of join clause. It's depending on how many records in tables. I think your user table is the smallest one. So place it first. It seems 'rc' table is the biggest one. You should place it at last of join.

Here's an example.

SELECT count(*) as cnt
FROM `user` u 
LEFT JOIN `case` c on c.id = rc.case_id 
LEFT JOIN `recent_changes` on u.id = rc.user_id 
WHERE 
(
    rc.user_id != c.user_id AND c.user_id = '25'
    OR
    (rc.type = 'conduct' AND rc.ref_id = '25')
)
AND c.locked = 'N'  AND rc.private != 'Y' 
AND seen = 'false'
ORDER BY rc.datecreated DESC;

Also, see below post. It's mssql thing but almost all DBMS has the same point here

https://www.mssqltips.com/sqlservertutorial/3201/how-join-order-can-affect-the-query-plan/

Update

I reviewed your question and found another suspect and it is about order by clause. As many rows returned from query, time cost for 'order by' will increase dramatically. It's been a frequent problem in my experience. Have you tried to remove order by clause? Is it much faster?

See Why is this INNER JOIN/ORDER BY mysql query so slow?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM