繁体   English   中英

有可能提高此SQL查询的性能吗?

[英]Possible to improve the performance of this SQL query?

我有一个具有超过100,000,000行的表,并且我有一个查询,如下所示:

SELECT
    COUNT(IF(created_at >= '2015-07-01 00:00:00', 1, null)) AS 'monthly',
    COUNT(IF(created_at >= '2015-07-26 00:00:00', 1, null)) AS 'weekly',
    COUNT(IF(created_at >= '2015-06-30 07:57:56', 1, null)) AS '30day',
    COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null)) AS 'recent'
FROM
    items
WHERE
    user_id = 123456;

该表如下所示:

CREATE TABLE `items` (
   `user_id` int(11) NOT NULL,
   `item_id` int(11) NOT NULL,
   `created_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
    PRIMARY KEY (`user_id`,`item_id`),
    KEY `user_id` (`user_id`,`created_at`),
    KEY `created_at` (`created_at`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

解释看起来相当无害,减去了大量的行数:

1   SIMPLE  items   ref PRIMARY,user_id user_id 4   const   559864  Using index

我使用查询在4个时间段内收集特定用户的计数。 有没有一种更聪明/更快的方法来获取相同的数据,或者我是将新行放入此表时对这些数据进行计数的唯一选择吗?

如果您在created_at上有索引,我还将在where子句created_at> ='2015-06-30 07:57:56'中输入,这是您细分中的最低日期。

同样,对于相同的索引,它可能会拆分为4个查询:

select count(*) AS '30day'
FROM
items
WHERE
    user_id = 123456
and created_at >= '2015-06-30 07:57:56'
union ....

等等

我会在created_at字段上添加索引:

ALTER TABLE items ADD INDEX idx_created_at (created_at)

或(按照托马斯的建议),因为您还要过滤user_id的created_at和user_id复合索引:

ALTER TABLE items ADD INDEX idx_user_created_at (user_id, created_at)

然后将您的查询写为:

SELECT 'monthly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-01 00:00:00' AND user_id = 123456

UNION ALL

SELECT 'weekly' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-26 00:00:00' AND user_id = 123456

UNION ALL

SELECT '30day' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-06-30 07:57:56' AND user_id = 123456

UNION ALL

SELECT 'recent' as description, COUNT(*) AS cnt FROM items
WHERE created_at >= '2015-07-29 17:03:44' AND user_id = 123456

是的,输出有些不同。 或者您可以使用内联查询:

SELECT
  (SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
  (SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
  ...

如果需要平均值,则可以使用子查询:

SELECT
  monthly,
  weekly,
  monthly / total,
  weekly / total
FROM (
  SELECT
    (SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'monthly',
    (SELECT COUNT(*) FROM items WHERE created_at>=... AND user_id=...) AS 'weekly',
    ...,
    (SELECT COUNT(*) FROM items WHERE user_id=...) AS total
) s
  • INDEX(user_id, created_at) -最佳
  • AND created_at >= '2015-06-30 07:57:56'帮助,因为它减少了要触摸的索引条目的数量
  • 进行UNION并没有帮助,因为它导致4倍的工作量。
  • 出于相同的原因,执行子查询SELECTs并没有帮助。

COUNT(IF(created_at >= '2015-07-29 17:03:44', 1, null))

可以缩短为

SUM(created_at >= '2015-07-29 17:03:44')

(但可能不会加快很多速度)

如果数据不会随时间变化,则仅添加新行,然后,过去数据的摘要表将大大提高速度,但是前提是您必须避免在“ 30day”中使用“ 07:57:56”之类的方法。 (为什么只对其中一些使用“ 00:00:00”?)也许在其他更改的基础上,提速将是另外10倍。 想进一步讨论吗?

(我看不到使用PARTITION任何优势。)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM