简体   繁体   English

如何从SQL查询中删除临时文件和文件排序?

[英]How do I remove temporary and filesort from my SQL query?

I have been trying to create an index in MySQL, but keep getting temporary and filesort whenever I run an explain on my query. 我一直在尝试在MySQL中创建索引,但是每当对查询运行解释时,都会保持临时和文件排序。

A simplified version of my tables looks like: 我的表的简化版本如下:

ordered_products
    op_id INT UNSIGNED NOT NULL AUTO_INCREMENT
    op_orderid INT UNSIGNED NOT NULL
    op_orderdate TIMESTAMP NOT NULL
    op_productid INT UNSIGNED NOT NULL

products
    p_id INT UNSIGNED NOT NULL AUTO_INCREMENT
    p_productname VARCHAR(128) NOT NULL
    p_enabled TINYINT NOT NULL

The 'ordered_products' table currently has more than 1,000,000 rows and is a record of all products that have been ordered, as well as the orders that they belong to. “ ordered_products”表当前有1,000,000多行,它记录了已订购的所有产品及其所属的订单。 This table grows rapidly. 该表增长迅速。

The 'products' table currently has around 3,000 rows and contains a list of products that are for sale. “产品”表目前大约有3,000行,并且包含要出售的产品列表。

The site displays a list of the top products for a given period (normally the last 3 days) and my query looks like: 该网站显示给定时间段(通常是最近3天)的热门产品列表,我的查询如下:

SELECT COUNT(op.op_productid) AS ProductCount, op.op_productid
FROM ordered_products op
LEFT JOIN products p ON op.op_productid=p.p_id
WHERE op.op_orderdate>='2014-03-08 00:00:00'
AND p.p_enabled=1
GROUP BY op.op_productid
ORDER BY ProductCount DESC, p.p_productname ASC

When I run that query, it normally takes around 800 milliseconds (0.8 seconds) to execute, which is ridiculous. 当我运行该查询时,通常需要大约800毫秒(0.8秒)来执行,这很荒谬。 We've remedied this with caching, however whenever the cache expires, we have a slowdown. 我们已通过缓存来解决此问题,但是只要缓存过期,我们的速度就会变慢。 I need to fix this. 我需要解决这个问题。

I have tried to index the tables, but no matter what I try, I can't avoid temporary and filesort. 我已经尝试为表建立索引,但是无论我如何尝试,我都无法避免进行临时和文件排序。 The output from EXPLAIN is: EXPLAIN的输出为:

id  select_type table   type    possible_keys   key key_len ref rows    Extra
1   SIMPLE  p   index   PRIMARY,idx_enabled_id_name idx_enabled_id_name 782 \N  1477    Using where; Using index; Using temporary; Using filesort
1   SIMPLE  op  ref idx_pid_oid_date    idx_pid_oid_date    4   test_store.p.p_id   9   Using where; Using index

If I remove the GROUP BY, the filesort disappears, however I need it to ensure the ProductCount value shows me every product count rather than a total sum of all products. 如果删除GROUP BY,文件排序将消失,但是我需要它来确保ProductCount值向我显示每个产品计数,而不是所有产品的总和。

If I remove the GROUP BY and the ORDER BY ProductCount, both temporary and filesort disappear, but now I am left with a very bad result set. 如果删除了GROUP BY和ORDER BY ProductCount,则临时和文件排序都将消失,但是现在我得到的结果集非常糟糕。

Can anyone please help me solve this? 谁能帮我解决这个问题? I have tried a multitude of different indexes, and have tried rewriting the SQL numerous times, but can never succeed. 我尝试了许多不同的索引,并尝试重写SQL多次,但都无法成功。

Any help would be greatly appreciated. 任何帮助将不胜感激。

You can't get rid of the temp table and filesort while you are using ORDER BY on a calculated column ProductCount . 在计算列ProductCount上使用ORDER BY ,无法摆脱临时表和文件排序。 There's no index for the calculated column, so it has to do do the sorting at the time of the query. 计算列没有索引,因此它必须在查询时进行排序。

I tried experimentally to reproduce your results. 我尝试性地重现您的结果。 I can put an index on op_productid and then the optimizer might use it to perform the GROUP BY . 我可以在op_productid上放置一个索引,然后优化器可以使用它来执行GROUP BY

mysql> EXPLAIN SELECT COUNT(op.op_productid) AS ProductCount, op.op_productid 
FROM ordered_products op FORCE INDEX (op_productid) STRAIGHT_JOIN products p 
  ON op.op_productid=p.p_id 
WHERE op.op_orderdate>='2014-03-08 00:00:00' AND p.p_enabled=1 
GROUP BY op.op_productid ORDER BY null;

In my case, I had to use STRAIGHT_JOIN and FORCE INDEX to override the optimizer. 就我而言,我必须使用STRAIGHT_JOIN和FORCE INDEX来覆盖优化程序。 But that might be due to my test environment, where I have only 1 or 2 rows per table for testing, and it throws off the optimizer's choices. 但这可能是由于我的测试环境所致,在该环境中,每个表仅具有1或2行用于测试,这使优化器无法做出选择。 In your real data, it might make a more sensible choice. 在您的真实数据中,这可能是一个更明智的选择。

Also, don't use LEFT JOIN if you have conditions in the WHERE clause that make the join implicitly an inner join. 另外,如果WHERE子句中有使联接隐式成为内部联接的条件,则不要使用LEFT JOIN。 Learn the types of joins and how they work -- don't always use LEFT JOIN by default. 了解联接的类型及其工作方式-默认情况下,不要总是使用LEFT JOIN。

+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key          | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+
|  1 | SIMPLE      | op    | index | op_productid  | op_productid | 4       | NULL |    5 | Using where |
|  1 | SIMPLE      | p     | ALL   | PRIMARY       | NULL         | NULL    | NULL |    1 | Using where |
+----+-------------+-------+-------+---------------+--------------+---------+------+------+-------------+

Your only alternative is to store a denormalized table, where the counts are persisted. 您唯一的选择是存储非规范化表,其中保留计数。 Then if your cache fails, it isn't an expensive query to refresh the cache. 然后,如果您的缓存失败,则刷新缓存并不昂贵。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM