![](/img/trans.png)
[英]Improving MySQL SELECT query speed of four huge (~100M rows) JOINed tables
[英]SELECT COUNT with JOIN optimization for tables with > 100M rows
我有以下查詢
SELECT SUBSTRING(a0_.created_date FROM 1 FOR 10) AS sclr_0,
COUNT(1) AS sclr_1
FROM applications a0_ INNER JOIN
package_codes p1_ ON a0_.id = p1_.application_id
WHERE a0_.created_date BETWEEN '2019-01-01' AND '2020-01-01' AND
p1_.type = 'Package 1'
GROUP BY sclr_0
---編輯---
大多數人都專注於GROUP BY和SUBSTRING,但這不是問題的根源。
以下查詢具有相同的執行時間:
SELECT COUNT(1) AS sclr_1
FROM applications a0_ INNER JOIN
package_codes p1_ ON a0_.id = p1_.application_id
WHERE a0_.created_date BETWEEN '2019-01-01' AND '2020-01-01' AND
p1_.type = 'Package 1'
---編輯2 ---
在applications.created_date上添加索引並強制查詢使用指定的索引后,@ DDS建議執行時間降至~750ms
當前查詢如下所示:
SELECT SUBSTRING(a0_.created_date FROM 1 FOR 10) AS sclr_0,
COUNT(1) AS sclr_1
FROM applications a0_ USE INDEX (applications_created_date_idx) INNER JOIN
package_codes p1_ USE INDEX (PRIMARY, UNIQ_70A9C6AA3E030ACD, package_codes_type_idx) ON a0_.id = p1_.application_id
WHERE a0_.created_date BETWEEN '2019-01-01' AND '2020-01-01' AND
p1_.type = 'Package 1'
GROUP BY sclr_0
---編輯3 ---
我發現在查詢中使用多個索引可能會導致在某些情況下MySQL將使用非最佳索引,因此最終查詢應如下所示:
SELECT SUBSTRING(a0_.created_date FROM 1 FOR 10) AS sclr_0,
COUNT(1) AS sclr_1
FROM applications a0_ USE INDEX (applications_created_date_idx) INNER JOIN
package_codes p1_ USE INDEX (package_codes_application_idx) ON a0_.id = p1_.application_id
WHERE a0_.created_date BETWEEN '2019-01-01' AND '2020-01-01' AND
p1_.type = 'Package 1'
GROUP BY sclr_0
---結束編輯---
package_codes包含超過100.000.000條記錄。
應用程序包含超過250.000條記錄。
查詢需要2分鍾才能得到結果。 有沒有辦法優化它? 我堅持使用MySQL 5.5。
表:
CREATE TABLE `applications` (
`id` int(11) NOT NULL,
`created_date` datetime NOT NULL,
`name` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL,
`surname` varchar(64) COLLATE utf8mb4_unicode_ci NOT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE `applications`
ADD PRIMARY KEY (`id`),
ADD KEY `applications_created_date_idx` (`created_date`);
ALTER TABLE `applications`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
CREATE TABLE `package_codes` (
`id` int(11) NOT NULL,
`application_id` int(11) DEFAULT NULL,
`created_date` datetime NOT NULL,
`type` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
`code` varchar(50) COLLATE utf8mb4_unicode_ci NOT NULL,
`disabled` tinyint(1) NOT NULL DEFAULT '0',
`meta_data` longtext COLLATE utf8mb4_unicode_ci
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
ALTER TABLE `package_codes`
ADD PRIMARY KEY (`id`),
ADD UNIQUE KEY `UNIQ_70A9C6AA3E030ACD` (`application_id`),
ADD KEY `package_codes_code_idx` (`code`),
ADD KEY `package_codes_type_idx` (`type`),
ADD KEY `package_codes_application_idx` (`application_id`),
ADD KEY `package_codes_code_application_idx` (`code`,`application_id`);
ALTER TABLE `package_codes`
MODIFY `id` int(11) NOT NULL AUTO_INCREMENT;
ALTER TABLE `package_codes`
ADD CONSTRAINT `FK_70A9C6AA3E030ACD` FOREIGN KEY (`application_id`) REFERENCES `applications` (`id`);
我的建議是避免這種情況:
SELECT SUBSTRING(a0_.created_date FROM 1 FOR 10) AS sclr_0,
[...]
GROUP BY sclr_0
因為每次dbms'重新計算'字段並且不能在其上使用索引時,如果你將這些數據放在它自己的列中並在其上做一個索引,你的性能應該提高
或者,至少使用date_part函數,因此mysql可以設法使用它的索引(顯然你應該在application.created_date上添加一個索引)
SELECT SUBSTRING(a0_.created_date FROM 1 FOR 10) AS sclr_0, COUNT(1) AS sclr_1
FROM applications a0_ INNER JOIN
package_codes p1_ ON (a0_.id = p1_.application_id and a0_.created_date
BETWEEN '2019-01-01' AND '2020-01-01' and p1_.type = 'Package 1')
FORCE INDEX (date_index, type_index)
Group by date(a0_.created_date)
另一個優化是將條件“推”到'on'子句,以便mysql在加入之前'過濾'數據 - >連接在更少的行上執行
編輯:這是在日期創建索引
CREATE INDEX date_index ON application(created_date);
如果你有比日期更多的類型,你應該考慮將索引放在類型上。
CREATE INDEX type_index ON package_codes(type);
[編輯2]請發布結果
select count(distinct date(a0_.created_date)) as N_DATES, count(distinct type)as N_TYPES
FROM applications a0_ INNER JOIN
package_codes p1_ ON a0_.id = p1_.application_id
只是對女巫指數有一個想法會更有選擇性
使用MySQL進行索引優化的有用鏈接
在applications.created_date上添加索引並強制查詢使用指定的索引后,@ DDS建議執行時間降至~750ms
最終查詢應如下所示:
SELECT SUBSTRING(a0_.created_date FROM 1 FOR 10) AS sclr_0,
COUNT(1) AS sclr_1
FROM applications a0_ USE INDEX (applications_created_date_idx) INNER JOIN
package_codes p1_ USE INDEX (package_codes_application_idx) ON a0_.id = p1_.application_id
WHERE a0_.created_date BETWEEN '2019-01-01' AND '2020-01-01' AND
p1_.type = 'Package 1'
GROUP BY sclr_0
您需要創建一個復合索引。 您似乎已在表上創建了單獨的索引。 在這種情況下,您需要在package_codes中對created_date單獨建立索引,並且還要為created_date和type創建復合索引。
也許在之前將日期和之后分組。
最佳指標是
p1_: (type, application_id)
a0_: (created_date, id)
這些適用於所有(?)版本的查詢,除了那些“強制”索引。
優化器將嘗試決定是以p1_
還是a0_
。 而且,有了這些指數,它應該有很好的機會選擇更好的表格。
SUBSTRING(a0_.created_date FROM 1 FOR 10)
可以簡化為DATE(a0_.created_date)
,但我懷疑它是否會改變性能。
請注意,索引將“覆蓋”,從而提供額外的提升。 EXPLAIN
表示Using index
(不Using index condition
)。
進一步改進:擺脫package_codes.id
並將application_id
提升為PRIMARY KEY
。 這可能會導致查詢的簡化!
我的建議適用於(也許)所有版本的MySQL。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.