簡體   English   中英

MySQL,用於大表查詢的復合索引

[英]MySQL,composite index for large table query

以下查詢在user_chars(約20mm記錄)和user_data(約10mm記錄)上運行。 查詢運行太慢,我想知道更好的復合索引是否可以改善這種情況。

關於最佳綜合指數是什么的任何想法?

SELECT username, title, status  
FROM (  
    SELECT username, title, status  
    FROM user_chars w, user_data r  
    WHERE w.user_id = r.user_id  
    AND (status < '300' OR is_admin = '1')    
    AND (  
        (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
        OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
        OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
        OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)  
        ...  
    )  
    GROUP BY w.user_id  
    HAVING COUNT(*) >= 3  
) data  
WHERE username != '0'  
AND title != '0'

以下是表格:

CREATE TABLE user_data (
  user_id int(10) unsigned NOT NULL AUTO_INCREMENT,
  username decimal(17,14) DEFAULT NULL,
  title decimal(17,14) DEFAULT NULL,
  status smallint(6) unsigned NOT NULL,
  is_admin tinyint(1) NOT NULL DEFAULT '0',
      PRIMARY KEY (user_id),
  KEY username (username),
  KEY title (title),
  KEY status (status),
  KEY is_admin (is_admin),
  KEY chars_avg_index (user_id,username,title,status),
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;


CREATE TABLE user_chars (
  user_id int(10) unsigned NOT NULL,
  rating_id char(32) DEFAULT NULL,
  rating tinyint(3) unsigned NOT NULL,
  PRIMARY KEY (user_id),
  KEY rating_id (rating_id),
  KEY rating (rating),
  KEY chars_index (user_id,rating_id,rating)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8;

編輯:添加了說明

+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+
| id | select_type | table      | type   | possible_keys                              | key             | key_len | ref       | rows  | Extra                                                     |
+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+
|  1 | PRIMARY     | <derived2> | ALL    | NULL                                       | NULL            | NULL    | NULL      |  3668 | Using where                                               |
|  2 | DERIVED     | w          | range  | user_id,rating_id,rating,chars_index       | chars_index     | 98      | NULL      | 13215 | Using where; Using index; Using temporary; Using filesort |
|  2 | DERIVED     | r          | eq_ref | PRIMARY,status,is_admin,chars_avg_index    | PRIMARY         | 4       | w.user_id |     1 | Using where                                               |
+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+

當我查看此查詢的EXPLAIN輸出時,看起來MySQL在使用user_data進行user_chars之前,正在將內部查詢的WHERE子句應用於user_chars 因此,在user_chars (rating_id, rating) (沒有user_id )上添加索引應該有助於內部查詢的WHERE子句:

ALTER TABLE user_chars ADD INDEX (rating_id, rating);

編輯:此行為取決於每個表中有多少行,因此發布EXPLAIN輸出將很有幫助:]

Edit2:我還將重寫查詢,如下所示:

SELECT username, title, status  
FROM user_chars w, user_data r  
WHERE w.user_id = r.user_id  
AND (status < '300' OR is_admin = '1')    
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
    ...
)  
AND username != '0'  
AND title != '0'
GROUP BY w.user_id  
HAVING COUNT(*) >= 3  

那是一個有趣的執行計划。 恐怕我真的不能提供任何特別具體的建議,主要是因為我沒有設法提出任何簡單的測試數據來說服我的MySQL服務器使用相同的計划。

不過,我確實有一些建議:

  • 您實際上並不需要嵌套查詢-您只需使用HAVING COUNT(*) >= 3 AND username != '0' AND title != '0'達到相同的效果。 或者,您可以嘗試將usernametitle條件移到內部WHERE子句中。

  • 我的測試表明,即使我在(is_admin, status)上創建索引,MySQL也不足以對status < '300' OR is_admin = '1'條件使用索引合並和/或范圍優化。 創建一個單獨的列來對這兩個值進行編碼可能是一個好主意,最好以只需要對其進行單個范圍比較的方式進行編碼。

  • 你也可以考慮擺脫你不需要 ,除非他們通過其他查詢所需要的任何索引。 未使用的索引只會占用空間,會降低INSERT速度,並使查詢計划程序混亂。

  • 如果您最近還沒有這樣做,請在ANALYZE TABLE上運行ANALYZE TABLE ,然后查看執行計划是否發生了變化。

user_data表的當前結構不幸地阻止了有效使用任何索引。

基本上,從user_data獲取的數據的總體條件如下:

WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')

這些條件應在聚合之前應用,否則聚合將處理多余的數據。

當您搜索等於其他內容的任何內容並且條件與AND結合在一起時,索引可以盡力而為,而情況恰恰相反。 因此,為了優化查詢,您可以引入一些非規范化列,該列可以以某種方式存儲結果(用戶名!='0'AND標題!='0'AND(狀態<'300'或is_admin ='1'))和被索引。 到那時,我們將繼續我們所擁有的。

您將結果與user_chars聯接在一起,后者又包含幾個OR,但它們全部對rating_id和rating操作。 由於rating列具有更高的選擇性(具有更多不同的值),因此,最好將該列放在復合索引(rating,rating_id)的左側。 有了索引,您就不再需要(rating)和(rating_id,rating)上的索引,只需刪除它們。

現在,我不確定MySQL是否可以自己進行優化,因此如果有以下查詢,則需要比較執行情況:

SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3

第二個:

SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND rating_id in ('rating1', 'rating2', 'rating3', 'rating4')
AND rating BETWEEN 55 AND 100 -- adjust the lines according to ... in your query
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3

后一個查詢的執行速度可能更快,因為它包含使用我們的索引的顯式提示。 此外,兩個查詢都僅選擇user_ids,而不在聚合過程中浪費內存。 現在,您可以將最快查詢的結果連接回user_data表:

SELECT username, title, status
FROM (
SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND rating_id in ('rating1', 'rating2', 'rating3', 'rating4')
AND rating BETWEEN 55 AND 100
AND (  
    (rating_id = 'rating1' AND rating BETWEEN 55 AND 65)  
    OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)  
    OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)  
    OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3
) as user_ids JOIN user_data USING (user_id);

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM