[英]Why doesn't this index work (Mysql)
I have this table: 我有这张桌子:
CREATE TABLE `maindb`.`daily_info` (
`di_date` date NOT NULL,
`di_sid` int(10) unsigned NOT NULL default '0',
`di_type` int(10) unsigned NOT NULL default '0',
`di_name` varchar(20) NOT NULL default '',
`di_num` int(10) unsigned NOT NULL default '0',
`di_abt` varchar(1) NOT NULL default 'a',
PRIMARY KEY (`di_date`,`di_sid`,`di_type`,`di_name`,`di_abt`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1;
When I use this query: 当我使用此查询时:
explain
SELECT MONTH(di_date) as label1, DAYOFMONTH(di_date) as label2, sum(di_num) as count , di_abt as abt
FROM `daily_info`
WHERE di_sid=6
AND di_type = 4
AND di_name='clk-1'
AND di_date > '2009-10-01' AND di_date < '2009-10-16'
GROUP BY
DAYOFMONTH(di_date)
ORDER BY
TO_DAYS(di_date) DESC
I get: 我明白了:
1, 'SIMPLE', 'daily_info', 'range', 'PRIMARY', 'PRIMARY', '3', '', 2500, 'Using where; Using temporary; Using filesort'
When actually if the key worked and the query would be filtered by di_date, di_sid and di_type, it would need to search only a few dozen rows. 实际上,如果密钥有效并且查询将被di_date,di_sid和di_type过滤,则只需要搜索几十行。
What is wrong with the index (or query?) 索引(或查询?)有什么问题?
Thanks! 谢谢!
You use the range condition on the first index column which kills possibility to filter on other columns. 您在第一个索引列上使用范围条件,这可能会导致在其他列上进行筛选。
There is no single contiguous range in this index which would contain those and only those records that satisfy the condition. 此索引中没有单个连续范围包含那些且仅包含满足条件的那些记录。
MySQL
is not able to do SKIP SCAN
which would jump over the distinct values of di_date
. MySQL
无法进行SKIP SCAN
,跳过di_date
的不同值。 That's why it does it's best: uses range
access to filter on di_date
and uses WHERE
to filter on all other fields. 这就是为什么它做得最好:使用
range
访问来过滤di_date
并使用WHERE
过滤所有其他字段。
Either recreate the index as this (the best decision): 要么重新创建索引(最好的决定):
PRIMARY KEY (`di_sid`,`di_type`,`di_name`,`di_date`,`di_abt`)
or, if you're unable to recreate the index, you can emulate the SKIP SCAN
: 或者,如果您无法重新创建索引,则可以模拟
SKIP SCAN
:
SELECT MONTH(di.di_date) as label1, DAYOFMONTH(di.di_date) as label2, sum(di.di_num) as count , di.di_abt as abt
FROM (
SELECT DISTINCT di_date
FROM daily_info
WHERE di_date > '2009-10-01' AND di_date < '2009-10-16'
) do
JOIN daily_info di
ON di.di_date <= do.di_date
AND di.di_date>= do.di_date
AND di_sid = 6
AND di_type = 4
AND di_name = 'clk-1'
GROUP BY
DAYOFMONTH(di.di_date)
ORDER BY
TO_DAYS(di.di_date) DESC
Make sure that Using index for group-by
and Range checked for each record
are present in the plan. 确保计划中存在“
Using index for group-by
和“ Range checked for each record
”。
This condition: 这个条件:
di.date <= do.date
AND di.date >= do.date
is used instead of simple di.date = do.date
to force the range checking. 使用而不是简单的
di.date = do.date
来强制范围检查。
See this article in my blog for more detailed explanation of emulating SKIP SCAN: 有关模拟SKIP SCAN的更多详细说明,请参阅我的博客中的这篇文章:
Update: 更新:
The latter query actually uses an equijoin and MySQL
optimizes it without the tricks. 后一个查询实际上使用了equijoin,并且
MySQL
在没有技巧的情况下优化它。
The trick above applies only to the ranged queries, ie when the innermost loop should use the range
access, not the ref
access. 上面的技巧仅适用于远程查询,即最内层循环应使用
range
访问,而不是ref
访问。
It would be useful if you had to do something like di_name <= 'clk-1'
如果你不得不做像
di_name <= 'clk-1'
那样的事情会很有用
This query should work fine: 此查询应该可以正常工作:
SELECT MONTH(di.di_date) as label1, DAYOFMONTH(di.di_date) as label2, sum(di.di_num) as count , di.di_abt as abt
FROM (
SELECT DISTINCT di_date
FROM daily_info
WHERE di_date > '2009-10-01' AND di_date < '2009-10-16'
) do
JOIN daily_info di
ON di.di_date = do.di_date
AND di_sid = 6
AND di_type = 4
AND di_name = 'clk-1'
GROUP BY
DAYOFMONTH(di.di_date)
ORDER BY
TO_DAYS(di.di_date) DESC
Make sure that di
uses ref
access on the whole subkey possible here, with key_len = 33
使用
key_len = 33
确保di
在整个子项上使用ref
访问权限
Update 2 更新2
In your query, you are using these expressions out of the GROUP BY
: 在您的查询中,您正在
GROUP BY
中使用这些表达式:
MONTH(di_date)
TO_DAYS(di_date)
di_abt
The query as it is now will sum all values for the 1st
, 2nd
etc. for any month and year. 现在的查询将对任何月份和年份的
1st
, 2nd
等的所有值求和。
I. e. I. e。 for the first group it will add up all values from
Jan 1st, 2000
, then Feb 1st, 2000
, etc. 对于第一组,它将累计
Jan 1st, 2000
Feb 1st, 2000
日等所有值。
Then it will return any random value of MONTH
, any random value of TO_DAYS
and any random value of di_abt
from each group. 然后,它会返回的任何随机值
MONTH
, 任何随机值TO_DAYS
和任何随机值di_abt
从每个组。
Your condition now is within a single month, so it's OK now, but if your condition will span multiple months (to say nothing of years), they query will produce unexpected results. 你的病情现在是一个月内,所以现在没关系,但是如果你的病情会持续数月(更不用说几年了),他们的查询会产生意想不到的结果。
Do you really want to group by dates? 你真的想按日期分组吗?
You are range-scanning the first part of the index - therefore it cannot use the subsequent parts of the index. 您是范围扫描索引的第一部分 - 因此它不能使用索引的后续部分。
The way to improve this is to create another index with the fields in a different order which is more conducive to this particular query. 改进方法的方法是使用不同顺序的字段创建另一个索引,这更有利于此特定查询。
If your index was di_sid,di_type,di_date then it may be better. 如果你的索引是di_sid,di_type,di_date那么它可能会更好。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.