[英]MySQL,composite index for large table query
以下查詢在user_chars(約20mm記錄)和user_data(約10mm記錄)上運行。 查詢運行太慢,我想知道更好的復合索引是否可以改善這種情況。
關於最佳綜合指數是什么的任何想法?
SELECT username, title, status
FROM (
SELECT username, title, status
FROM user_chars w, user_data r
WHERE w.user_id = r.user_id
AND (status < '300' OR is_admin = '1')
AND (
(rating_id = 'rating1' AND rating BETWEEN 55 AND 65)
OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)
OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)
OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
...
)
GROUP BY w.user_id
HAVING COUNT(*) >= 3
) data
WHERE username != '0'
AND title != '0'
以下是表格:
CREATE TABLE user_data (
user_id int(10) unsigned NOT NULL AUTO_INCREMENT,
username decimal(17,14) DEFAULT NULL,
title decimal(17,14) DEFAULT NULL,
status smallint(6) unsigned NOT NULL,
is_admin tinyint(1) NOT NULL DEFAULT '0',
PRIMARY KEY (user_id),
KEY username (username),
KEY title (title),
KEY status (status),
KEY is_admin (is_admin),
KEY chars_avg_index (user_id,username,title,status),
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
CREATE TABLE user_chars (
user_id int(10) unsigned NOT NULL,
rating_id char(32) DEFAULT NULL,
rating tinyint(3) unsigned NOT NULL,
PRIMARY KEY (user_id),
KEY rating_id (rating_id),
KEY rating (rating),
KEY chars_index (user_id,rating_id,rating)
) ENGINE=MyISAM DEFAULT CHARSET=utf8;
編輯:添加了說明
+----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+ | id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra | +----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+ | 1 | PRIMARY | <derived2> | ALL | NULL | NULL | NULL | NULL | 3668 | Using where | | 2 | DERIVED | w | range | user_id,rating_id,rating,chars_index | chars_index | 98 | NULL | 13215 | Using where; Using index; Using temporary; Using filesort | | 2 | DERIVED | r | eq_ref | PRIMARY,status,is_admin,chars_avg_index | PRIMARY | 4 | w.user_id | 1 | Using where | +----+-------------+------------+--------+--------------------------------------------+-----------------+---------+-----------+-------+-----------------------------------------------------------+
當我查看此查詢的EXPLAIN
輸出時,看起來MySQL在使用user_data
進行user_chars
之前,正在將內部查詢的WHERE
子句應用於user_chars
。 因此,在user_chars
(rating_id, rating)
(沒有user_id
)上添加索引應該有助於內部查詢的WHERE
子句:
ALTER TABLE user_chars ADD INDEX (rating_id, rating);
編輯:此行為取決於每個表中有多少行,因此發布EXPLAIN
輸出將很有幫助:]
Edit2:我還將重寫查詢,如下所示:
SELECT username, title, status
FROM user_chars w, user_data r
WHERE w.user_id = r.user_id
AND (status < '300' OR is_admin = '1')
AND (
(rating_id = 'rating1' AND rating BETWEEN 55 AND 65)
OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)
OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)
OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
...
)
AND username != '0'
AND title != '0'
GROUP BY w.user_id
HAVING COUNT(*) >= 3
那是一個有趣的執行計划。 恐怕我真的不能提供任何特別具體的建議,主要是因為我沒有設法提出任何簡單的測試數據來說服我的MySQL服務器使用相同的計划。
不過,我確實有一些建議:
您實際上並不需要嵌套查詢-您只需使用HAVING COUNT(*) >= 3 AND username != '0' AND title != '0'
達到相同的效果。 或者,您可以嘗試將username
和title
條件移到內部WHERE
子句中。
我的測試表明,即使我在(is_admin, status)
上創建索引,MySQL也不足以對status < '300' OR is_admin = '1'
條件使用索引合並和/或范圍優化。 創建一個單獨的列來對這兩個值進行編碼可能是一個好主意,最好以只需要對其進行單個范圍比較的方式進行編碼。
你也可以考慮擺脫你不需要 ,除非他們通過其他查詢所需要的任何索引。 未使用的索引只會占用空間,會降低INSERT
速度,並使查詢計划程序混亂。
如果您最近還沒有這樣做,請在ANALYZE TABLE
上運行ANALYZE TABLE
,然后查看執行計划是否發生了變化。
user_data
表的當前結構不幸地阻止了有效使用任何索引。
基本上,從user_data
獲取的數據的總體條件如下:
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
這些條件應在聚合之前應用,否則聚合將處理多余的數據。
當您搜索等於其他內容的任何內容並且條件與AND結合在一起時,索引可以盡力而為,而情況恰恰相反。 因此,為了優化查詢,您可以引入一些非規范化列,該列可以以某種方式存儲結果(用戶名!='0'AND標題!='0'AND(狀態<'300'或is_admin ='1'))和被索引。 到那時,我們將繼續我們所擁有的。
您將結果與user_chars
聯接在一起,后者又包含幾個OR,但它們全部對rating_id和rating操作。 由於rating列具有更高的選擇性(具有更多不同的值),因此,最好將該列放在復合索引(rating,rating_id)的左側。 有了索引,您就不再需要(rating)和(rating_id,rating)上的索引,只需刪除它們。
現在,我不確定MySQL是否可以自己進行優化,因此如果有以下查詢,則需要比較執行情況:
SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND (
(rating_id = 'rating1' AND rating BETWEEN 55 AND 65)
OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)
OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)
OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3
第二個:
SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND rating_id in ('rating1', 'rating2', 'rating3', 'rating4')
AND rating BETWEEN 55 AND 100 -- adjust the lines according to ... in your query
AND (
(rating_id = 'rating1' AND rating BETWEEN 55 AND 65)
OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)
OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)
OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3
后一個查詢的執行速度可能更快,因為它包含使用我們的索引的顯式提示。 此外,兩個查詢都僅選擇user_ids,而不在聚合過程中浪費內存。 現在,您可以將最快查詢的結果連接回user_data
表:
SELECT username, title, status
FROM (
SELECT user_id
FROM user_data JOIN user_chars USING (user_id)
WHERE username != '0' AND title != '0' AND (status < '300' OR is_admin = '1')
AND rating_id in ('rating1', 'rating2', 'rating3', 'rating4')
AND rating BETWEEN 55 AND 100
AND (
(rating_id = 'rating1' AND rating BETWEEN 55 AND 65)
OR (rating_id = 'rating2' AND rating BETWEEN 50 AND 60)
OR (rating_id = 'rating3' AND rating BETWEEN 30 AND 40)
OR (rating_id = 'rating4' AND rating BETWEEN 90 AND 100)
)
GROUP BY user_id
HAVING COUNT(*) > 3
) as user_ids JOIN user_data USING (user_id);
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.