[英]Optimizing a SQL query to avoid full table scan
Consider the following query: 请考虑以下查询:
SELECT * FROM Transactions
WHERE day(Stamp - interval 3 hour) = 1;
The Stamp column in the Transactions table is a TIMESTAMP and there is an index on it. Transactions表中的Stamp列是TIMESTAMP,并且有一个索引。 How could I change this query so it avoids full table scans?
我怎样才能更改此查询以避免全表扫描? (that is, using Stamp outside of the day() function)
(即在day()函数之外使用Stamp )
Thanks! 谢谢!
This is how I would do it: 我就是这样做的:
add some extra fields: YEAR, MONTH, DAY or even HOUR, MINUTE depending on the traffic you expect. 添加一些额外的字段:YEAR,MONTH,DAY甚至HOUR,MINUTE,具体取决于您期望的流量。 Then build a trigger to populate the extra fields, maybe subtracting the 3 hour interval in advance.
然后构建一个触发器来填充额外的字段,可以提前减去3小时的间隔。 Finally build some index on the extra fields.
最后在额外字段上构建一些索引。
If the goal is just to avoid full table scans and you have a PRIMARY KEY (say named PK) for Transactions, consider adding covering index 如果目标只是为了避免全表扫描并且您有一个PRIMARY KEY(比如命名为PK),请考虑添加覆盖索引
ALTER TABLE Transactions ADD INDEX cover_1 (PK, Stamp)
Then 然后
SELECT * FROM Transactions WHERE PK IN (SELECT PK FROM Transactions
WHERE day(Stamp - interval 3 hour) = 1
)
This query should not use full table scans (however optimizer may decide to use full scan, if number of rows in table is small or for whatever other statistical reason :) ) 此查询不应使用全表扫描(但是,如果表中的行数很小或者出于其他任何统计原因,优化程序可能决定使用完全扫描:))
Better way may be is to use temporary table instead of subquery. 更好的方法可能是使用临时表而不是子查询。
You can often rewrite the function so you have something that looks like WHERE Stamp=XXXX
and XXXX is some expression. 你经常可以重写这个函数,所以你有一些看起来像
WHERE Stamp=XXXX
和XXXX的东西。 You could create a series of BETWEEN statements for each month, WHERE Stamp BETWEEN timestamp('2010-01-01 00:00:00') AND timestamp ('2010-01-01 23:59:59') OR Stamp BETWEEN ...
, but I'm not certain this would use the index in this case. 您可以为每个月创建一系列BETWEEN语句,
WHERE Stamp BETWEEN timestamp('2010-01-01 00:00:00') AND timestamp ('2010-01-01 23:59:59') OR Stamp BETWEEN ...
,但我不确定在这种情况下会使用索引。 I'd build a column that was the day of the month as @petr suggests. 我建立了一个专栏,就像@petr建议的那样。
Calculate your desired Stamp value separately before you run your main query, ie 在运行主查询之前单独计算所需的Stamp值,即
Step 1 - calculate the desired Stamp value 第1步 - 计算所需的Stamp值
Step 2 - run a query where Stamp > (calculated value) 第2步 - 运行查询,其中Stamp>(计算值)
Because there's no calculation in step 2, you should be able to use your index. 由于步骤2中没有计算,您应该能够使用索引。
If I understand it correctly, you basically want to return all rows where the stamp falls on the first in each month (having subtracted the 3 hours)? 如果我理解正确的话,你基本上想要返回每个月邮票落在第一行的所有行(减去3小时)? If (and this is a big if), you have a fixed window of, say the latest 6 months, you could just enumerate 6 range tests.
如果(这是一个很大的if),你有一个固定的窗口,比如说最近的6个月,你可以枚举6个范围测试。 But still, I'm not sure indexed access will be faster anyways.
但是,我仍然不确定索引访问会更快。
select *
from transactions
where stamp between timestamp '2010-06-01 03:00:00' and timestamp '2010-06-02 02:59:59'
or stamp between timestamp '2010-07-01 03:00:00' and timestamp '2010-07-02 02:59:59'
or stamp between timestamp '2010-08-01 03:00:00' and timestamp '2010-08-02 02:59:59'
or stamp between timestamp '2010-09-01 03:00:00' and timestamp '2010-09-02 02:59:59'
or stamp between timestamp '2010-10-01 03:00:00' and timestamp '2010-10-02 02:59:59'
or stamp between timestamp '2010-11-01 03:00:00' and timestamp '2010-11-02 02:59:59'
or stamp between timestamp '2010-12-01 03:00:00' and timestamp '2010-12-02 02:59:59';
NB! NB! I'm not sure how the millisecond part of the timestamp works.
我不确定时间戳的毫秒部分是如何工作的。 You may need to pad it accordingly.
您可能需要相应地填充它。
Reworking petr's answer a bit to avoid the IN clause, and to make it for MyISAM or InnoDB. 重新设计petr的答案以避免使用IN子句,并将其用于MyISAM或InnoDB。
For MyISAM 对于MyISAM
ALTER TABLE Transactions ADD INDEX cover_1 (PK, Stamp)
Or, for InnoDB, where the PK is implicitly included in every index, 或者,对于InnoDB,PK隐含地包含在每个索引中,
ALTER TABLE Transactions ADD INDEX Stamp (Stamp)
Then 然后
SELECT *
FROM Transactions LEFT JOIN
(
SELECT PK
FROM Transactions
WHERE DAYOFMONTH(Stamp - interval 3 hour) = 1
) a ON Transactions.PK=a.PK
The subquery will have an index only execution, and the outer query will only pull the rows from the table where a.PK came through. 子查询将仅执行索引,外部查询将仅从a.PK通过的表中提取行。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.