[英]Why does adding a WHERE statement (on a column with an index) to my query increase my run time from seconds to minutes?
My problem is with this query in MySQL: 我的问题是在MySQL中使用以下查询:
select
SUM(OrderThreshold < @LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > @HIGH_COST) as HIGH_COUNT
FROM parts
-- where parttypeid = 1
When the where
is uncommented, my run time jumps for 4.5 seconds to 341 seconds. 当不注释where
,我的运行时间跳了4.5秒到341秒。 There are approximately 21M total records in this table. 该表中共有大约2100万条记录。
My EXPLAIN
looks like this, which seems to indicate that it is utilizing the INDEX I have on PartTypeId
. 我的EXPLAIN
看起来像这样,似乎表明它正在利用我在PartTypeId
拥有的INDEX。
id select_type table type possible_keys key key_len ref rows Extra
1 SIMPLE parts ref PartTypeId PartTypeId 1 const 11090057
I created my table using this query: 我使用以下查询创建了表格:
CREATE TABLE IF NOT EXISTS parts (
Id INTEGER NOT NULL PRIMARY KEY,
PartTypeId TINYINT NOT NULL,
OrderThreshold INTEGER NOT NULL,
PartName VARCHAR(500),
INDEX(Id),
INDEX(PartTypeId),
INDEX(OrderThreshold),
);
The query with out the WHERE
returns 不带WHERE
的查询返回
LOW_COUNT HIGH_COUNT
3570 3584
With the where
the results look like this: 随着where
的结果是这样的:
LOW_COUNT HIGH_COUNT
2791 2147
How can I improve the performance of my query to keep the run time down in the seconds (instead of minutes) range when adding a where
statement that only looks at one column? 当添加仅查看一列的where
语句时,如何提高查询性能以将运行时间保持在几秒钟(而不是几分钟)范围内?
Try 尝试
select SUM(OrderThreshold < @LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > @HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1
and OrderThreshold not between @LOW_COST and @HIGH_COST
and 和
select count(*) as LOW_COUNT, null as HIGH_COUNT
from parts
where parttypeid = 1
and OrderThreshold < @LOW_COST
union all
select null, count(*)
from parts
where parttypeid = 1
and OrderThreshold > @HIGH_COST
Your accepted answer doesn't explain what is going wrong with your original query: 您接受的答案无法说明原始查询出了什么问题:
select SUM(OrderThreshold < @LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > @HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1;
The index is being used to find the results, but there are a lot of rows with parttypeid = 1
. 该索引用于查找结果,但是有很多行的parttypeid = 1
。 I am guessing that each data page probably has at least one such row. 我猜每个数据页可能至少有一个这样的行。 That means that all the rows are being fetched, but they are being read out-of-order. 这意味着所有行都已被获取,但是它们却是乱序读取的。 That is slower than just doing a full table scan (as in the first query). 这比仅进行全表扫描(如第一个查询)要慢。 In other words, all the data pages are being read, but the index is adding additional overhead. 换句话说,正在读取所有数据页,但是索引增加了额外的开销。
As Juergen points out, a better form of the query moves the conditions into the where
clause: 正如Juergen指出的那样,更好的查询形式将条件移到where
子句中:
select SUM(OrderThreshold < @LOW_COST) as LOW_COUNT,
SUM(OrderThreshold > @HIGH_COST) as HIGH_COUNT
from parts
where parttypeid = 1 AND
(OrderThreshold < @LOW_COST OR OrderThreshold > @HIGH_COST)
(I prefer this form, because the where
conditions match the case
conditions.) For this query, you want an index on parts(parttypeid, OrderThreshold)
. (我喜欢这种形式,因为where
条件与case
条件匹配。)对于此查询,您需要在parts(parttypeid, OrderThreshold)
上建立索引。 I'm not sure about the MySQL optimizer in this case, but it might be better to write as: 在这种情况下,我不确定MySQL优化器,但写为:
select 'Low' as which, count(*) as CNT
from parts
where parttypeid = 1 AND
OrderThreshold < @LOW_COST
union all
select 'High', count(*) as CNT
from parts
where parttypeid = 1 AND
OrderThreshold > @HIGH_COST;
Each subquery should definitely use the index in this case. 在这种情况下,每个子查询绝对应使用索引。 (If you want them in one row with two columns, there are a couple ways to achieve that, but I'm guessing that is not so important.) (如果您希望它们以两列的形式排成一行,则有两种方法可以实现这一点,但我想那并不是那么重要。)
Unfortunately, the best index for your query without the where
clause is parts(OrderThreshold)
. 不幸的是,没有where
子句的查询的最佳索引是parts(OrderThreshold)
。 This is a different index from the above. 这是与上述索引不同的索引。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.