简体   繁体   English

mysql日期查询总是做全盘扫描

[英]mysql date query always doing full scan

I am trying to do a supposedly very simple query. 我试图做一个据说非常简单的查询。 I have a table with datetime column with timestamps. 我有一个带有时间戳的datetime列的表。

I need to find all parent table rows which do not have a timestamp of the last 5 minutes. 我需要查找所有没有最近5分钟时间戳记的父表行。 This may change row to row as described below. 如下所述,这可以逐行改变。 I read a number of articles, trying changing my query a lot, but still my query is not using index properly. 我阅读了许多文章,试图对查询进行很多更改,但是查询仍然没有正确使用索引。

1) the access table shown below may have more than one rows of mon.id. 1)下面显示的访问表可能有多于一列的mon.id。 2) I need to find all mon.id's which do not have a row in access table, with a lastaccess_date datetime within the last mon.duration minutes. 2)我需要找到所有在访问表中没有一行的mon.id,在过去的mon.duration分钟内具有lastaccess_date datetime。 3) the access table may have more than 1 rows, so the row with latest timestamp needs to be checked for the duration logic. 3)访问表可能有多于1行,因此需要检查具有最新时间戳的行的持续时间逻辑。

Tables are as below: 表格如下:

mon (parent)
-----------
id,payload,duration

access (child)
---
id,mon_id,lastaccess_date

Current query is 当前查询是

select id,payload,elapsed,duration from 
(SELECT mon.id,payload,TIMESTAMPDIFF(MINUTE, lastaccess_date, NOW()) as elapsed,duration
    FROM mon
    inner JOIN access_log log on mon.id=log.monitor_id
order by lastaccess_date desc
 ) as t1
GROUP BY id
having elapsed>duration

I also made number of other queries, but these do not seem to be efficient. 我也进行了许多其他查询,但是这些查询似乎并不有效。 If I have 100 rows, then these queries are not using index and doing full table scan. 如果我有100行,则这些查询没有使用索引,而是进行全表扫描。

Please suggest an efficient query which can uses indexes. 请提出一个可以使用索引的有效查询。 If required, I can tweak the table design a bit if it helps for this case. 如果需要,我可以对表设计进行一些调整,以帮助解决这种情况。

mysql EXPLAIN of this query is something like below: 该查询的mysql EXPLAIN如下所示:

在此处输入图片说明

EDIT : As per comment, and what I had already tried before, I even changed the query to a drastic: 编辑 :根据评论,以及我之前已经尝试过的,我什至将查询更改为激烈的:

select monitor_id
  from access_log
 WHERE access_dt not between date_sub(now(),INTERVAL 5 MINUTE) and now()

now I am not touching the access_dt DATETIME column in the where clause, but still its doing a full table scan. 现在,我没有触及where子句中的access_dt DATETIME列,但仍在进行全表扫描。 The query returns 40 rows out of 100 rows in this test scenario. 在此测试方案中,查询返回100行中的40行。

Here is the EXPLAIN now: 这是现在的解释:

id, select_type, table, type, possible_keys, key, key_len, ref, rows, filtered, Extra
'1', 'SIMPLE', 'access_log', 'ALL', 'access_dt', NULL, NULL, NULL, '100', '100.00', 'Using where'

There are several possibilities for your second query's EXPLAIN not being what you expect. 第二种查询的EXPLAIN不符合您的期望。

First of all, don't waste time worrying about the EXPLAIN results for small tables. 首先,不要浪费时间担心小表的EXPLAIN结果。 This is aa tiny table right now, and your query is returning more than half of it. 现在这是一个很小的表,您的查询返回的表超过一半。 The MySQL query planner may have not chosen the index simply because it did not seem to be selective enough to be worth the trouble of paging into RAM and using. MySQL查询计划者可能没有选择索引仅仅是因为它似乎没有足够的选择性以至于不值得分页到RAM和使用它的麻烦。 If that's the case, the situation may change as your tables grow. 如果真是这样,情况可能会随着表的增长而改变。

Second, you have this clause: 其次,您有以下子句:

WHERE access_dt not between date_sub(now(),INTERVAL 5 MINUTE) 
                        and now()

The not may prove to be unhelpful, because it gets performed as if it were not可能被证明是无用的,因为它好像是被执行

WHERE (    access_dt < date_sub(now(),INTERVAL 5 MINUTE)
        OR access_dt > now() )

OR clauses are no fun for MySQL to evaluate. OR子句对MySQL进行评估并不有趣。 If you happen to know that access_dt values cannot be in the future, you can do. 如果您碰巧知道将来不能使用access_dt值,则可以这样做。

WHERE access_dt < date_sub(now(), INTERVAL 5 MINUTE)

and that's eligible for an index range scan. 并且可以进行索引范围扫描。

Thirdly, you appear to be misusing GROUP BY in your first query. 第三,您似乎在第一个查询中滥用了GROUP BY Do you mean ORDER BY ? 您是说ORDER BY吗? It's hard to figure out what you need. 很难弄清楚您需要什么。 Read this: http://dev.mysql.com/doc/refman/5.6/en/group-by-handling.html 阅读此: http : //dev.mysql.com/doc/refman/5.6/en/group-by-handling.html

Finally, let's take a look at your inner query in your first query, and try to optimize it. 最后,让我们在第一个查询中查看内部查询,并尝试对其进行优化。 You started with this, which I have edited to show the tables from which each column comes. 您从此开始,我已经对其进行了编辑,以显示每一列所来自的表。

SELECT mon.id, mon.payload,
      TIMESTAMPDIFF(MINUTE, log.lastaccess_date, NOW()) as elapsed,
      mon.duration
 FROM mon
inner JOIN access_log log ON mon.id=log.monitor_id
order by log lastaccess_date desc

Let's adjust this by adding the timestamp selection criterion to your ON clause. 让我们通过在ON子句中添加时间戳选择标准来进行调整。

  ...
  FROM mon
 INNER JOIN access_log LOG 
       ON mon.id = log.monitor_id
     AND log.lastaccess_date < DATE_SUB(NOW(),INTERVAL mon.duration MINUTE)

That will select the rows you want. 那将选择您想要的行。 When you get relatively large tables (at least 10K rows in access_log ) you should experiment with the following two compound indexes to see whether one or the other gives you better results. 当获得相对较大的表( access_log中至少有1万行)时,应尝试使用以下两个复合索引,以查看一个或另一个是否能为您带来更好的结果。

 (monitor_id, lastaccess_date)
 (lastaccess_date, monitor_id)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM