简体   繁体   English

分别保留记录的子集以提高MySQL的查询性能

[英]Keep subset of records separately for query performance mysql

I have a large table containing over 10 million records and It will keep growing. 我有一张大桌子,上面有超过1000万条记录,并且它将一直在增长。 I am performing an aggregation query (count of particular value) on records of last 24 hours. 我正在对过去24小时的记录执行聚合查询(特定值的计数)。 The time taken by this query will keep increasing with number of records in the table. 该查询所花费的时间将随着表中记录数的增加而增加。

I can limit the time taken by keeping these 24 hours records in separate table and perform aggregation on that table. 我可以通过将这24小时的记录保存在单独的表中并在该表上进行汇总来限制时间。 Does mysql provide any functionality to handle this kind of scenario? mysql是否提供任何功能来处理这种情况?

Table schema and query for reference: 表架构和查询以供参考:

CREATE TABLE purchases (
    Id int(11) NOT NULL AUTO_INCREMENT, 
    ProductId int(11) NOT NULL, 
    CustomerId int(11) NOT NULL, 
    PurchaseDateTime datetime(3) NOT NULL, 
    PRIMARY KEY (Id), 
    KEY ix_purchases_PurchaseDateTime (PurchaseDateTime) USING BTREE, 
    KEY ix_purchases_ProductId (ProductId) USING BTREE, 
    KEY ix_purchases_CustomerId (CustomerId) USING BTREE
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

select COALESCE(sum(ProductId = v_ProductId), 0),
       COALESCE(sum(CustomerId = v_CustomerId), 0)
    into v_ProductCount, v_CustomerCount
    from purchases
    where PurchaseDateTime > NOW() - INTERVAL 1 DAY
      and (   ProductId = v_ProductId
           or CustomerId = v_CustomerId );

Build and maintain a separate Summary table . 建立并维护一个单独的摘要表

With partitioning, you might get a small improvement, or you might get no improvement. 使用分区,您可能会有所改善,或者可能没有任何改善。 With a summary table, you might get a factor of 10 improvement. 使用汇总表,您可能会获得10倍的改善。

The summary table could have a 1-day resolution, or you might need 1-hour. 摘要表可能有1天的分辨率,或者您可能需要1个小时。 Please provide SHOW CREATE TABLE for what you currently have, so we can discuss more specifics. 请提供您当前所拥有的SHOW CREATE TABLE ,以便我们可以讨论更多细节。

(There is no built-in mechanism for what you want.) (没有针对您想要的内置机制。)

Plan A 计划A

I would leave off 我会离开

      and (   ProductId = v_ProductId
           or CustomerId = v_CustomerId )

since the rest of the query will simply deal with it anyway. 因为查询的其余部分将只处理它。

Then I would add 然后我会添加

INDEX(PurchaseDateTime, ProductId, CustomerId)

which would be "covering" -- that is, the entire SELECT can be performed in the INDEX's BTree. 这将是“覆盖”的-也就是说,整个SELECT都可以在INDEX的BTree中执行。 It would also be 'clustered' in the sense that all the data needed would be consecutively stored in the index. 所有需要的数据都将连续存储在索引的意义上说,它也将被“集群化”。 Yes, the datetime is deliberately first. 是的,日期时间是故意的。 ( OR is a nuisance to optimize. I don't trust the Optimizer to do "index merge union".) OR是优化的麻烦。我不相信优化器执行“索引合并并集”。)

Plan B 计划B

If you expect to touch very few rows (because of v_ProductId and v_CustomerId ), then the following may be faster, in spite of being more complex: 如果您希望触摸很少的行(由于v_ProductIdv_CustomerId ),那么尽管更复杂,但以下操作可能会更快:

SELECT COALESCE(sum(ProductId = v_ProductId), 0)
    INTO v_ProductCount
    FROM purchases
    WHERE PurchaseDateTime > NOW() - INTERVAL 1 DAY
      AND ProductId = v_ProductId;
SELECT COALESCE(sum(CustomerId = v_CustomerId), 0)
    INTO v_CustomerCount
    FROM purchases
    WHERE PurchaseDateTime > NOW() - INTERVAL 1 DAY
      AND CustomerId = v_CustomerId;

together with both: 与两者一起:

INDEX(ProductId, PurchaseDateTime),
INDEX(CustomerId, PurchaseDateTime)

Yes, the order of the columns is deliberately different. 是的,列的顺序故意不同。

Original Question 原始问题

Both of these approaches are better than your original suggestion of a separate table. 这两种方法都比您最初建议的单独表格要好。 These isolate the data in one part of an index (or two indexes), thereby having the effect of "separate". 这些将数据隔离在一个索引(或两个索引)的一部分中,从而具有“分离”的效果。 And these do the task with less effort on your part. 这些都可以让您轻松完成任务。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM