简体   繁体   English

为什么此查询执行全表扫描?

[英]Why is this query doing a full table scan?

The query: 查询:

SELECT tbl1.*
   FROM tbl1 
JOIN tbl2
     ON (tbl1.t1_pk  = tbl2.t2_fk_t1_pk
AND tbl2.t2_strt_dt <= sysdate
AND tbl2.t2_end_dt  >= sysdate)
JOIN tbl3 on (tbl3.t3_pk = tbl2.t2_fk_t3_pk
AND tbl3.t3_lkup_1 = 2577304
AND tbl3.t3_lkup_2 = 1220833)
where tbl2.t2_lkup_1   = 1020000002981587;

Facts: 事实:

  • Oracle XE Oracle XE
  • tbl1.t1_pk is a primary key. tbl1.t1_pk是主键。
  • tbl2.t2_fk_t1_pk is a foreign key on that t1_pk column. tbl2.t2_fk_t1_pk是该t1_pk列上的外键。
  • tbl2.t2_lkup_1 is indexed. tbl2.t2_lkup_1已编入索引。
  • tbl3.t3_pk is a primary key. tbl3.t3_pk是主键。
  • tbl2.t2_fk_t3_pk is a foreign key on that t3_pk column. tbl2.t2_fk_t3_pk是该t3_pk列上的外键。

Explain plan on a database with 11,000 rows in tbl1 and 3500 rows in tbl2 shows that it's doing a full table scan on tbl1. 在tbl1中对11,000行和tbl2中的3500行的数据库进行解释计划表明它正在对tbl1进行全表扫描。 Seems to me that it should be faster if it could do a index query on tbl1. 在我看来,如果它可以在tbl1上进行索引查询,它应该更快。

Explain plan on a database with 11,000 rows in tbl1 and 3500 rows in tbl2 shows that it's doing a full table scan on tbl1. 在tbl1中对11,000行和tbl2中的3500行的数据库进行解释计划表明它正在对tbl1进行全表扫描。 Seems to me that it should be faster if it could do a index query on tbl1. 在我看来,如果它可以在tbl1上进行索引查询,它应该更快。

Update: I tried the hint a few of you suggested, and the explain cost got much worse! 更新:我尝试了一些你建议的提示,解释成本变得更糟! Now I'm really confused. 现在我真的很困惑。

Further Update: I finally got access to a copy of the production database, and "explain plan" showed it using indexes and with a much lower cost query. 进一步更新:我终于可以访问生产数据库的副本,“解释计划”使用索引和低成本查询显示它。 I guess having more data (over 100,000 rows in tbl1 and 50,000 rows in tbl2) were what it took to make it decide that indexes were worth it. 我想有更多的数据(在tbl1中超过100,000行,在tbl2中有50,000行)是让它决定索引是值得的。 Thanks to everybody who helped. 感谢所有帮助过的人。 I still think Oracle performance tuning is a black art, but I'm glad some of you understand it. 我仍然认为Oracle性能调优是一种黑色艺术,但我很高兴你们中的一些人理解它。

Further update: I've updated the question at the request of my former employer. 进一步更新:我已根据前雇主的要求更新了问题。 They don't like their table names showing up in google queries. 他们不喜欢在谷歌查询中显示他们的表名。 I should have known better. 我应该知道的更好。

The easy answer: Because the optimizer expects more rows to find then it actually does find. 答案很简单:因为优化器需要更多的行来查找,所以它确实找到了。

Check the statistics, are they up to date? 检查统计数据,它们是最新的吗? Check the expected cardinality in the explain plan do they match the actual results? 检查解释计划中的预期基数是否与实际结果相符? If not fix the statistics relevant for that step. 如果不修复与该步骤相关的统计信息。

Histogramms for the joined columns might help. 连接列的直方图可能会有所帮助。 Oracle will use those to estimate the cardinality resulting from a join. Oracle将使用这些来估计连接产生的基数。

Of course you can always force index usage with a hint 当然,您始终可以通过提示强制使用索引

It would be useful to see the optimizer's row count estimates, which are not in the SQL Developer output you posted. 查看优化程序的行计数估计值会很有用,这些估计值不在您发布的SQL Developer输出中。

I note that the two index lookups it is doing are RANGE SCAN not UNIQUE SCAN. 我注意到它正在进行的两个索引查找是RANGE SCAN而不是UNIQUE SCAN。 So its estimates of how many rows are being returned could easily be far off (whether statistics are up to date or not). 因此,它对返回的行数的估计可能很容易(无论统计数据是否是最新的)。

My guess is that its estimate of the final row count from the TABLE ACCESS of TBL2 is fairly high, so it thinks that it will find a large number of matches in TBL1 and therefore decides on doing a full scan/hash join rather than a nested loop/index scan. 我的猜测是它对TBL2的TABLE ACCESS的最终行数的估计相当高,所以它认为它会在TBL1中找到大量匹​​配,因此决定进行全扫描/散列连接而不是嵌套循环/索引扫描。

For some real fun, you could run the query with event 10053 enabled and get a trace showing the calculations performed by the optimizer. 为了获得一些真正的乐趣,您可以在启用事件10053的情况下运行查询,并获得显示优化程序执行的计算的跟踪。

Oracle tries to return the result set with the least amount of I/O required (typically, which makes sense because I/o is slow). Oracle尝试以最少的I / O数量返回结果集(通常,这是有道理的,因为I / O很慢)。 Indexes take at least 2 I/O calls. 索引至少需要2个I / O调用。 one to the index and one to the table. 一个到索引,一个到表。 Usually more, depending on the size of the index and tables sizes and the number of records returns, where they are in the datafile, ... 通常更多,取决于索引的大小和表大小以及记录返回的数量,它们在数据文件中的位置,...

This is where statistics come in. Lets say your query is estimated to return 10 records. 这是统计数据的来源。让我们说你的查询估计会返回10条记录。 The optimizer may calculate that using an index will take 10 I/O calls. 优化器可以计算出使用索引将需要10个I / O调用。 Let's say your table, according to the statistics on it, resides in 6 blocks in the data file. 假设你的表根据它的统计数据存在于数据文件中的6个块中。 It will be faster for Oracle to do a full scan ( 6 I/O) then read the index, read the table, read then index for the next matching key, read the table and so on. Oracle进行全扫描(6 I / O)然后读取索引,读取表,读取下一个匹配键的索引,读取表等等将更快。

So in your case, the table may be real small. 所以在你的情况下,表可能很小。 The statistics may be off. 统计数据可能已关闭。

I use the following to gather statistics and customize it for my exact needs: 我使用以下内容来收集统计信息并根据我的确切需求进行自定义:

begin

 DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '&owner' ,tabname => '&table_name', estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE,granularity
=> 'ALL', cascade  => TRUE); 

 -- DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '&owner' ,tabname => '&table_name',partname => '&partion_name',granularity => 'PARTITION', estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE, cascade 
=> TRUE);

 -- DBMS_STATS.GATHER_TABLE_STATS(ownname
=> '&owner' ,tabname => '&table_name',partname => '&partion_name',granularity => 'PARTITION', estimate_percent => dbms_stats.AUTO_SAMPLE_SIZE, cascade 
=> TRUE,method_opt  => 'for all indexed columns size 254');

end;

You can only tell by looking at the query plan the SQL optimizer/executor creates. 您只能通过查看SQL优化器/执行程序创建的查询计划来判断。 It will be at least partial based on index statistics which cannot be predicted from just the definition (and can, therefore, change over time). 它至少部分基于索引统计数据,这些统计数据无法仅从定义中预测(因此可以随时间变化)。

SQL Management studio for SQL Server 2005/2008, Query Analyzer for earlier versions. 用于SQL Server 2005/2008的SQL Management Studio,用于早期版本的查询分析器。

(Can't recall the right tool names for Oracle.) (无法回想起Oracle的正确工具名称。)

Try adding an index hint. 尝试添加索引提示。

SELECT /*+ index(tbl1 tbl1_index_name) */ .....

Sometimes Oracle just doesn't know which index to use. 有时Oracle只知道要使用哪个索引。

Depends on your expected result size you can play arround with some session parameters: 取决于您的预期结果大小,您可以使用一些会话参数播放:

SHOW PARAMETER optimizer_index_cost_adj;
[...]
ALTER SESSION SET optimizer_index_cost_adj = 10;

SHOW PARAMETER OPTIMIZER_MODE;
[...]
ALTER SESSION SET OPTIMIZER_MODE=FIRST_ROWS_100;

and dont forget to check the real executiontime, sometimes the plan is not the real world ;) 并且不要忘记检查真正的执行时间,有时计划不是现实世界;)

Apparently this query gives the same plan: 显然这个查询提供了相同的计划:

SELECT tbl1.*   
FROM tbl1 
JOIN tbl2 ON (tbl1.t1_pk  = tbl2.t2_fk_t1_pk)
JOIN tbl3 on (tbl3.t3_pk = tbl2.t2_fk_t3_pk)
where tbl2.t2_lkup_1   = 1020000002981587
AND tbl2.t2_strt_dt <= sysdate
AND tbl2.t2_end_dt  >= sysdate
AND tbl3.t3_lkup_1 = 2577304
AND tbl3.t3_lkup_2 = 1220833;

What happens if you rewrite this query to: 如果您将此查询重写为:

SELECT tbl1.*    
FROM  tbl1 
,     tbl2
,     tbl3  
where tbl2.t2_lkup_1   = 1020000002981587 
AND   tbl1.t1_pk  = tbl2.t2_fk_t1_pk 
AND   tbl3.t3_pk = tbl2.t2_fk_t3_pk 
AND   tbl2.t2_strt_dt <= sysdate 
AND   tbl2.t2_end_dt  >= sysdate 
AND   tbl3.t3_lkup_1 = 2577304 
AND   tbl3.t3_lkup_2 = 1220833;

It looks like an index for tbl1 table is not being picked up. 它看起来像tbl1表的索引没有被提取。 Make sure you have an index for t2_lkup_1 column and it should not be multi-column otherwise the index is not applicable. 确保您有t2_lkup_1列的索引,它不应该是多列,否则索引不适用。

(in addition to what Matt's comment) From your query I believe you're joining because you want to filter out records not to do JOIN which may increase cardinality for result set from tbl1 table if there are duplicate matches from . (除了马特的评论之外)从你的查询中我相信你加入是因为你想过滤掉不要做JOIN的记录,这可能会增加来自tbl1表的结果集的基数,如果有重复的匹配来自。 See Jeff Atwood comment 杰夫阿特伍德评论

Try this, which uses exist function and join (which is really fast on oracle) 试试这个,它使用存在的函数和连接(在oracle上真的很快)

select *
  from tbl1 
 where tbl2.t2_lkup_1 = 1020000002981587 and
       exists (
         select *
           from tbl2, tbl3 
          where tbl2.t2_fk_t1_pk = tbl1.t1_pk and
                tbl2.t2_fk_t3_pk = tbl3.t3_pk  and
                sysdate between tbl2.t2_strt_dt and tbl2.t2_end_dt and
                tbl3.t3_lkup_1 = 2577304 and
                tbl3.t3_lkup_2 = 1220833);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM