简体   繁体   中英

MySQL,composite index for large table query

The following query is run on user_chars (approx 20mm records) and user_data (approx 10mm records). The query runs too slowly and I was wondering if better composite indexes might improve the situation.

Any idea on what the best composite index would be?

SELECT username, title, status  
FROM (  
    SELECT username, title, status  
    FROM user_chars w, user_data r  
    WHERE w.user_id = r.user_id  
    AND (status < '300' OR is_admin = '1')    
    AND (  
        (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
        OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
        OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
        OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)  
        ...  
    )  
    GROUP BY w.user_id  
    HAVING COUNT(*) >= 3  
) data  
WHERE username != '0'  
AND title != '0'

And here following are the tables:

CREATE TABLE user_data (
  user_id int(10) unsigned NOT NULL AUTO_INCREMENT,
  username decimal(17,14) DEFAULT NULL,
  title decimal(17,14) DEFAULT NULL,
  status smallint(6) unsigned NOT NULL,
  is_admin tinyint(1) NOT NULL DEFAULT '0',
      PRIMARY KEY (user_id),
  KEY username (username),
  KEY title (title),
  KEY status (status),
  KEY is_admin (is_admin),
  KEY chars_avg_index (user_id,username,title,status),
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;


CREATE TABLE user_chars (
  user_id int(10) unsigned NOT NULL,
  rating_id char(32) DEFAULT NULL,
  rating tinyint(3) unsigned NOT NULL,
  PRIMARY KEY (user_id),
  KEY rating_id (rating_id),
  KEY rating (rating),
  KEY chars_index (user_id,rating_id,rating)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

EDIT: Added the EXPLAIN

+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+
| id | select_type | table      | type   | possible_keys                              | key             | key_len | ref       | rows  | Extra                                                     |
+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL                                       | NULL            | NULL    | NULL      |  3668 | Using where                                               |
|  2 | DERIVED     | w          | range  | user_id,rating_id,rating,chars_index       | chars_index     | 98      | NULL      | 13215 | Using where; Using index; Using temporary; Using filesort |
|  2 | DERIVED     | r          | eq_ref | PRIMARY,status,is_admin,chars_avg_index    | PRIMARY         | 4       | w.user_id |     1 | Using where                                               |
+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+

When I look at the EXPLAIN output for this query, it looks like MySQL is applying the WHERE clause of the inner query to user_chars before doing the join with user_data . So, adding an index on (rating_id, rating) (without user_id ) in user_chars should help with the WHERE clause of the inner query:

ALTER TABLE user_chars ADD INDEX (rating_id, rating);

Edit: this behavior depends on how many rows are in each table, so posting your EXPLAIN output would be helpful :]

Edit2: I would also rewrite the query as follows:

SELECT username, title, status  
FROM user_chars w, user_data r  
WHERE w.user_id = r.user_id  
AND (status < '300' OR is_admin = '1')    
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
    ...
)  
AND username != '0'  
AND title != '0'
GROUP BY w.user_id  
HAVING COUNT(*) >= 3  

That's an interesting execution plan. I'm afraid I can't really offer any particularly concrete advice, mostly since I didn't manage to come up with any simple test data that would convince my MySQL server to use the same plan.

I do have some random suggestions, though:

  • You don't really need the nested query — you can just use HAVING COUNT(*) >= 3 AND username != '0' AND title != '0' for the same effect. Or you could try moving the username and title conditions into the inner WHERE clause.

  • My tests suggest that MySQL isn't smart enough to use an index merge and/or range optimization for the status < '300' OR is_admin = '1' condition, even if I create an index on (is_admin, status) . It might be a good idea to create a single column that encodes both of these values, preferably in such a manner that you only need a single range comparison on it.

  • You might also consider getting rid of any indexes you don't need, unless they're needed by other queries. Unused indexes just take up space, slow down INSERT s and confuse the query planner.

  • If you haven't done so recently, run ANALYZE TABLE on your tables and see if the execution plan changes.

The currect structure of the user_data table unfortunatelly prevents effiecient using of any indexes.

Basically, the overall conditon for data taken from user_data is the following:

WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')

The conditions should be applied before the aggregation, otherwise the aggregation would process excess data.

Indexes can do their best when you search anything that equals to something else and conditions are joined with AND, your case is the opposite. Thus, to optimize the query you can introduce some denormalization column, which somehow can store the result of (username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')) and be indexed. Till then, we will continue with the what we have.

You the join the result with user_chars which again contains several ORs, but all of them operate on rating_id and rating. Since, the rating column is more selective (has more distinct values), it is a good idea to put the column on the left in a composite index (rating, rating_id). Having the index you do not need anymore the index on (rating) and (rating_id, rating), just drop them.

Now, I am not sure if MySQL can do the optimization itself, so you need to compare the execution if of the following queries:

SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3

and the second one:

SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND rating_id in ('rating1', 'rating2', 'rating3', 'rating4')
AND rating BETWEEN 55 AND 100 -- adjust the lines according to ... in your query
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3

The latter query might perform faster, because it contains explicit hint to use our index. Besides, both queries select only user_ids not wasting memory during aggregation. Now, you can join the result of the quickest query back to user_data table:

SELECT username, title, status
FROM (
SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND rating_id in ('rating1', 'rating2', 'rating3', 'rating4')
AND rating BETWEEN 55 AND 100
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3
) as user_ids JOIN user_data USING (user_id);

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM