简体   繁体   English

优化SQL查询以避免全表扫描

[英]Optimizing a SQL query to avoid full table scan

Consider the following query: 请考虑以下查询:

SELECT * FROM Transactions
WHERE day(Stamp - interval 3 hour) = 1;

The Stamp column in the Transactions table is a TIMESTAMP and there is an index on it. Transactions表中的Stamp列是TIMESTAMP,并且有一个索引。 How could I change this query so it avoids full table scans? 我怎样才能更改此查询以避免全表扫描? (that is, using Stamp outside of the day() function) (即在day()函数之外使用Stamp

Thanks! 谢谢!

This is how I would do it: 我就是这样做的:

add some extra fields: YEAR, MONTH, DAY or even HOUR, MINUTE depending on the traffic you expect. 添加一些额外的字段:YEAR,MONTH,DAY甚至HOUR,MINUTE,具体取决于您期望的流量。 Then build a trigger to populate the extra fields, maybe subtracting the 3 hour interval in advance. 然后构建一个触发器来填充额外的字段,可以提前减去3小时的间隔。 Finally build some index on the extra fields. 最后在额外字段上构建一些索引。

If the goal is just to avoid full table scans and you have a PRIMARY KEY (say named PK) for Transactions, consider adding covering index 如果目标只是为了避免全表扫描并且您有一个PRIMARY KEY(比如命名为PK),请考虑添加覆盖索引

ALTER TABLE Transactions ADD INDEX cover_1 (PK, Stamp)

Then 然后

SELECT * FROM Transactions WHERE PK IN (SELECT PK FROM Transactions
WHERE day(Stamp - interval 3 hour) = 1
 )

This query should not use full table scans (however optimizer may decide to use full scan, if number of rows in table is small or for whatever other statistical reason :) ) 此查询不应使用全表扫描(但是,如果表中的行数很小或者出于其他任何统计原因,优化程序可能决定使用完全扫描:))

Better way may be is to use temporary table instead of subquery. 更好的方法可能是使用临时表而不是子查询。

You can often rewrite the function so you have something that looks like WHERE Stamp=XXXX and XXXX is some expression. 你经常可以重写这个函数,所以你有一些看起来像WHERE Stamp=XXXX和XXXX的东西。 You could create a series of BETWEEN statements for each month, WHERE Stamp BETWEEN timestamp('2010-01-01 00:00:00') AND timestamp ('2010-01-01 23:59:59') OR Stamp BETWEEN ... , but I'm not certain this would use the index in this case. 您可以为每个月创建一系列BETWEEN语句, WHERE Stamp BETWEEN timestamp('2010-01-01 00:00:00') AND timestamp ('2010-01-01 23:59:59') OR Stamp BETWEEN ... ,但我不确定在这种情况下会使用索引。 I'd build a column that was the day of the month as @petr suggests. 我建立了一个专栏,就像@petr建议的那样。

Calculate your desired Stamp value separately before you run your main query, ie 在运行主查询之前单独计算所需的Stamp值,即

Step 1 - calculate the desired Stamp value 第1步 - 计算所需的Stamp值

Step 2 - run a query where Stamp > (calculated value) 第2步 - 运行查询,其中Stamp>(计算值)

Because there's no calculation in step 2, you should be able to use your index. 由于步骤2中没有计算,您应该能够使用索引。

If I understand it correctly, you basically want to return all rows where the stamp falls on the first in each month (having subtracted the 3 hours)? 如果我理解正确的话,你基本上想要返回每个月邮票落在第一行的所有行(减去3小时)? If (and this is a big if), you have a fixed window of, say the latest 6 months, you could just enumerate 6 range tests. 如果(这是一个很大的if),你有一个固定的窗口,比如说最近的6个月,你可以枚举6个范围测试。 But still, I'm not sure indexed access will be faster anyways. 但是,我仍然不确定索引访问会更快。

select *
  from transactions
 where stamp between timestamp '2010-06-01 03:00:00' and timestamp '2010-06-02 02:59:59'
    or stamp between timestamp '2010-07-01 03:00:00' and timestamp '2010-07-02 02:59:59'
    or stamp between timestamp '2010-08-01 03:00:00' and timestamp '2010-08-02 02:59:59'
    or stamp between timestamp '2010-09-01 03:00:00' and timestamp '2010-09-02 02:59:59'
    or stamp between timestamp '2010-10-01 03:00:00' and timestamp '2010-10-02 02:59:59'
    or stamp between timestamp '2010-11-01 03:00:00' and timestamp '2010-11-02 02:59:59'
    or stamp between timestamp '2010-12-01 03:00:00' and timestamp '2010-12-02 02:59:59';

NB! NB! I'm not sure how the millisecond part of the timestamp works. 我不确定时间戳的毫秒部分是如何工作的。 You may need to pad it accordingly. 您可能需要相应地填充它。

Reworking petr's answer a bit to avoid the IN clause, and to make it for MyISAM or InnoDB. 重新设计petr的答案以避免使用IN子句,并将其用于MyISAM或InnoDB。

For MyISAM 对于MyISAM

ALTER TABLE Transactions ADD INDEX cover_1 (PK, Stamp)

Or, for InnoDB, where the PK is implicitly included in every index, 或者,对于InnoDB,PK隐含地包含在每个索引中,

ALTER TABLE Transactions ADD INDEX Stamp (Stamp)

Then 然后

SELECT * 
FROM Transactions LEFT JOIN
  (
  SELECT PK 
  FROM Transactions 
  WHERE DAYOFMONTH(Stamp - interval 3 hour) = 1
  ) a ON Transactions.PK=a.PK

The subquery will have an index only execution, and the outer query will only pull the rows from the table where a.PK came through. 子查询将仅执行索引,外部查询将仅从a.PK通过的表中提取行。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM