简体   繁体   English

MySQL:针对日期范围内的记录优化查询

[英]MySQL: Optimizing query for records within date range

I have a table ( logs ) that has the following columns (there are others, but these are the important ones): 我有一个具有以下列的表( logs )(还有其他列,但是这些是重要的):

  • id (PK, int) id(PK,int)
  • Timestamp (datetime) (index) 时间戳记(日期时间)(索引)
  • Duration (int) 持续时间(整数)

Basically this is a record for an event that starts at a time and ends at a time. 基本上,这是一个事件的记录,该事件一次开始,一次结束。 This table currently has a few hundred thousand rows in it. 该表当前有几十万行。 I expect it to grow to millions. 我希望它会增长到数百万。 For the purpose of speeding up queries, I have added another column and precomputed values: 为了加快查询的速度,我添加了另一列和预先计算的值:

  • EndTime (datetime) (index) 结束时间(日期时间)(索引)

To calculate EndTime I have added the number of seconds in Duration to the Timestamp field. 为了计算结束Duration ,我在“ Duration Timestamp字段中添加了“ Duration中的秒数。

Now what I want to do is run a query where the result counts the number of rows where the start ( Timestamp ) and end times ( EndTime ) fall outside of a certain point in time. 现在,我想做的是运行一个查询,该查询的结果计算开始TimestampTimestamp )和结束时间( EndTime )超出某个时间点的行数。 I then want to run this query for every second for a large timespan (such as a year). 然后,我想在很大的时间范围内(例如一年)每秒运行一次此查询。 I would also like to count the number of rows that start on a particular point in time, and end at a particular point in time. 我还想计算在特定时间点开始并在特定时间点结束的行数。

I have created the following query: 我创建了以下查询:

SELECT 
    `dates`.`date`, 
    COUNT(*) AS `total`, 
    SUM(IF(`dates`.`date`=`logs`.`Timestamp`, 1, 0)) AS `new`,
    SUM(IF(`dates`.`date`=`logs`.`EndTime`, 1, 0)) AS `dropped` 
FROM 
    `logs`,
    (SELECT 
        DATE_ADD("2010-04-13 09:45:00", INTERVAL `number` SECOND) AS `date` 
        FROM numbers LIMIT 120) AS dates
WHERE dates.`date` BETWEEN `logs`.`Timestamp` AND `logs`.`EndTime` 
GROUP BY `dates`.`date`;

Note that the numbers table is strictly for easily enumerating a date range. 请注意,数字表严格用于轻松枚举日期范围。 It is a table with one column, number , and contains the values 1, 2, 3, 4, 5, etc... 它是具有一列number的表,并且包含值1,2,3,4,5等。

This gives me exactly what I am looking for... a table with 4 columns: 这正好为我提供了我想要的东西...带有4列的表格:

  • date 日期
  • total (the total rows that start and end outside the current point in time) 总计(在当前时间点之外开始和结束的总行)
  • new (rows that start at this point in time) 新的(在此时间点开始的行)
  • dropped (rows that end at this point in time) 掉线(在此时间点结束的行)

The trouble is, this query can take a significant amount of time to execute. 问题是,此查询可能要花费大量时间才能执行。 To go through 120 seconds (as shown in the query), it takes about 10 seconds. 要经过120秒(如查询中所示),大约需要10秒。 I suspect that this is about as fast as I am going to get it, but I thought I would ask here if anyone had any ideas for improving the performance of this query. 我怀疑这与我将要得到的速度差不多,但是我想在这里问是否有人对提高此查询的性能有任何想法。

Any suggestions would be most helpful. 任何建议将是最有帮助的。 Thank you for your time. 感谢您的时间。

Edit: I have indexes on Timestamp and EndTime. 编辑:我有时间戳和EndTime上的索引。

The output of EXPLAIN on my query: 我的查询中EXPLAIN的输出:

"id";"select_type";"table";"type";"possible_keys";"key";"key_len";"ref";"rows";"Extra"
"1";"PRIMARY";"<derived2>";"ALL";NULL;NULL;NULL;NULL;"120";"Using temporary; Using filesort"
"1";"PRIMARY";"logs";"ALL";"Timestamp,EndTime";NULL;NULL;NULL;"296159";"Range checked for each record (index map: 0x6)"
"2";"DERIVED";"numbers";"index";NULL;"PRIMARY";"4";NULL;"35546940";"Using index"

When I run analyze on my logs table, it says status OK. 当我在日志表上运行分析时,状态为OK。

Note in the EXPLAIN output that the join type for the logs table is "ALL" and the key is NULL, which means a full table scan is scheduled. 请注意,在EXPLAIN输出中, logs表的EXPLAIN类型为“ ALL”,键为NULL,这表示已计划进行全表扫描。 The "Range checked for each record" message means that MySQL uses the range access method on logs after examining column values from somewhere else in the result. “检查每个记录的范围”消息意味着MySQL在检查结果中其他地方的列值之后,在logs上使用了范围访问方法 I take this to mean that once dates has been created, MySQL can perform a ranged join on logs using the second and third indices (likely those on Timestamp and EndTime ) rather than performing a full table scan. 我的意思是,一旦创建了dates ,MySQL便可以使用第二和第三个索引(可能是TimestampEndTime索引)对logs执行远程EndTime而不是执行全表扫描。 If you only have indices on Timestamp and EndTime separately, try adding an index on both, which might result in a more efficient join type (eg index_merge rather than range ): 如果仅在TimestampEndTime分别具有索引,请尝试在两者上都添加索引,这可能会导致更有效的index_merge类型(例如index_merge而不是range ):

CREATE INDEX `start_end` ON `logs` (`Timestamp`, `EndTime`);

I believe (though could easily be wrong) that other items in the query plan either aren't really a concern or can't be eliminated. 我相信(尽管可能很容易出错)查询计划中的其他项不是真正关心的问题,还是无法消除。 The filesort, as an example of the latter, is likely due to the GROUP BY . 作为后者的示例,该文件排序可能是由于GROUP BY In other words, this is likely the extent of what you can do with this particular query, though radically different queries or approaches that address table storage format are still possibly more efficient. 换句话说,这可能是您可以使用此特定查询进行操作的程度,尽管解决表存储格式的根本不同的查询或方法可能仍然更有效。

You could look at merge tables to speedup the processing. 您可以查看合并表以加快处理速度。 With merge tables, since the tables are split up, the indexes are smaller resulting in faster fetching. 使用合并表时,由于将表拆分开了,索引变小了,从而加快了获取速度。 Also, if you have multiple processors, the searches can happen in parallel increasing the performance. 另外,如果您有多个处理器,则搜索可以并行进行以提高性能。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM