[英]Possible to improve the performance of this SQL query?
我有一個具有超過100,000,000行的表,並且我有一個查詢,如下所示:
SELECT
COUNT(IF(created_at >= '2015-07-01 00:00:00', 1, null)) AS 'monthly',
COUNT(IF(created_at >= '2015-07-26 00:00:00', 1, null)) AS 'weekly',
COUNT(IF(created_at >= '2015-06-30 07:57:56', 1, null)) AS '30day',
COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null)) AS 'recent'
FROM
items
WHERE
user_id = 123456;
該表如下所示:
CREATE TABLE `items` (
`user_id` int(11) NOT NULL,
`item_id` int(11) NOT NULL,
`created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
PRIMARY KEY (`user_id`,`item_id`),
KEY `user_id` (`user_id`,`created_at`),
KEY `created_at` (`created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
解釋看起來相當無害,減去了大量的行數:
1 SIMPLE items ref PRIMARY,user_id user_id 4 const 559864 Using index
我使用查詢在4個時間段內收集特定用戶的計數。 有沒有一種更聰明/更快的方法來獲取相同的數據,或者我是將新行放入此表時對這些數據進行計數的唯一選擇嗎?
如果您在created_at上有索引,我還將在where子句created_at> ='2015-06-30 07:57:56'中輸入,這是您細分中的最低日期。
同樣,對於相同的索引,它可能會拆分為4個查詢:
select count(*) AS '30day'
FROM
items
WHERE
user_id = 123456
and created_at >= '2015-06-30 07:57:56'
union ....
等等
我會在created_at字段上添加索引:
ALTER TABLE items ADD INDEX idx_created_at (created_at)
或(按照托馬斯的建議),因為您還要過濾user_id的created_at和user_id復合索引:
ALTER TABLE items ADD INDEX idx_user_created_at (user_id, created_at)
然后將您的查詢寫為:
SELECT 'monthly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-01 00:00:00' AND user_id = 123456
UNION ALL
SELECT 'weekly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-26 00:00:00' AND user_id = 123456
UNION ALL
SELECT '30day' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-06-30 07:57:56' AND user_id = 123456
UNION ALL
SELECT 'recent' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-29 17:03:44' AND user_id = 123456
是的,輸出有些不同。 或者您可以使用內聯查詢:
SELECT
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
...
如果需要平均值,則可以使用子查詢:
SELECT
monthly,
weekly,
monthly / total,
weekly / total
FROM (
SELECT
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
(SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
...,
(SELECT COUNT(*) FROM items WHERE user_id=...) AS total
) s
INDEX(user_id, created_at)
-最佳 AND created_at >= '2015-06-30 07:57:56'
幫助,因為它減少了要觸摸的索引條目的數量 UNION
並沒有幫助,因為它導致4倍的工作量。 SELECTs
並沒有幫助。 也
COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null))
可以縮短為
SUM(created_at >= '2015-07-29 17:03:44')
(但可能不會加快很多速度)
如果數據不會隨時間變化,則僅添加新行,然后,過去數據的摘要表將大大提高速度,但是前提是您必須避免在“ 30day”中使用“ 07:57:56”之類的方法。 (為什么只對其中一些使用“ 00:00:00”?)也許在其他更改的基礎上,提速將是另外10倍。 想進一步討論嗎?
(我看不到使用PARTITION
任何優勢。)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.