如何减少海量数据表的查询执行时间

Question

I am running this query in production(Oracle) and it is taking more than 3 minutes . 我正在生产（Oracle）中运行此查询，它耗时超过3分钟。 Is there any way out to reduce the execution time ? 有什么办法可以减少执行时间？ Both svc_order and event table contains almost 1million records . svc_order和事件表都包含近一百万条记录。

select 0 test_section, count(1) count, 'DD' test_section_value  
from svc_order so, event e  
where so.svc_order_id = e.svc_order_id  
  and so.entered_date >= to_date('01/01/2012', 'MM/DD/YYYY')  
  and e.event_type = 230 and e.event_level = 'O'  
  and e.current_sched_date between 
      to_date( '09/01/2010 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
      and to_date('09/29/2013 23:59:59', 'MM/DD/YYYY HH24:MI:SS')  
  and (((so.sots_ta = 'N') and (so.action_type = 0)) 
       or  ((so.sots_ta is null) and (so.action_type = 0)) 
       or  ((so.sots_ta = 'N') and (so.action_type is null)))
  and so.company_code = 'LL'

Answer 1

Looking at the what you said that you cannot create indexes. 查看您所说的无法创建索引的内容。 I hope that the query is making a full table scan on the table. 我希望查询对表进行全表扫描。 Please try a parallel hint. 请尝试并行提示。

select /*+ full(so) parallel(so, 4) */ 0 test_section, count(1) count, 'DD' test_section_value  
from svc_order so, event e  
where so.svc_order_id = e.svc_order_id  
  and so.entered_date >= to_date('01/01/2012', 'MM/DD/YYYY')  
  and e.event_type = 230 and e.event_level = 'O'  
  and e.current_sched_date between 
      to_date( '09/01/2010 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
      and to_date('09/29/2013 23:59:59', 'MM/DD/YYYY HH24:MI:SS')  
  and (((so.sots_ta = 'N') and (so.action_type = 0)) 
       or  ((so.sots_ta is null) and (so.action_type = 0)) 
       or  ((so.sots_ta = 'N') and (so.action_type is null)))
  and so.company_code = 'LL'

Answer 2

You could at least avoid the triple AND/OR list by using COALESCE() (or its oracle equivalent IFNULL() ) Note: this does not catch the case where both sots_ta and action_type are NULL. 你至少应该避免通过三重AND / OR列表COALESCE()或其等价的Oracle IFNULL()注：这不赶其中两个 sots_ta和ACTION_TYPE是空的情况。

SELECT 0 test_section, count(1) count, 'DD' test_section_value
FROM svc_order so 
JOIN event e  ON so.svc_order_id = e.svc_order_id
WHERE e.event_type = 230 and e.event_level = 'O'  
  AND so.entered_date >= to_date('01/01/2012', 'MM/DD/YYYY')
  AND e.current_sched_date >= to_date('09/01/2010 00:00:00', 'MM/DD/YYYY HH24:MI:SS')
  AND e.current_sched_date  < to_date('10/01/2013 00:00:00', 'MM/DD/YYYY HH24:MI:SS') 
  AND  COALESCE(so.sots_ta, 'N') = 'N'
  AND  COALESCE(so.action_type, 0) = 0   
  AND so.company_code = 'LL'

I replaced the between by a plain t >= low AND t. < high) 我用一个普通的t >= low AND t. < high)替换了之间的内容t >= low AND t. < high) t >= low AND t. < high) test because I don't like between s semantics. t >= low AND t. < high)测试，因为我不喜欢s between的语义。 I replaced the FROM kommalist by a JOIN because I like joins better. 我用JOIN代替了FROM kommalist ，因为我喜欢更好的联接。

Answer 3

We cannot have additional indexes but tables must have at least meaning full primary key, right so is there one? 我们不能有其他索引，但是表必须至少具有完整主键的含义，对吗？ That should result in at least index, non/clustered, anything. 那应该至少导致索引，非/聚集的任何东西。 Look at it lets and try to make use of it. 看看它让我们尝试使用它。

In case table is a heap, and we want to deal with it as it is, then we should reduce the number rows in each table individually by applying respective where filters and then combine that result set. 如果表是一个堆，我们想按原样处理它，那么我们应该通过应用相应的where过滤器来减少每个表中的行数，然后合并该结果集。 In your query only meaning full result column depends on base tables is count(1). 在您的查询中，仅意味着全结果列取决于基表是count（1）。 Other two columns are constants. 另外两列是常量。 Because also JOIN/Cartesian Product etc….. will lead DB engine to look for Indexes so instead use INTERSECT which I feel should better in your case. 因为JOIN /笛卡尔积等…..也会导致DB引擎寻找索引，所以请改用INTERSECT，我觉得这种情况在您的情况下会更好。 Some other changes you can do: Avoid using TO_DATE or any kind of function in Right Side of the WHERE condition column. 您可以执行其他一些更改：避免在WHERE条件列的右侧使用TO_DATE或任何类型的函数。 Prepare data in local Variable and use Local Variable in query. 在局部变量中准备数据，并在查询中使用局部变量。 Also you need to check is there any good performance gain using >= than BETWEEN ? 您还需要检查使用> =是否比BETWEEN有良好的性能提升？

I have modified the query and also combined one redundant where condition. 我修改了查询，还合并了一个多余的where条件。 Remember that if this changes works for you right now that doesn't mean it will work always. 请记住，如果此更改现在对您有效，并不意味着它将一直有效。 As son your table start hitting more data that qualifies those WHERE conditions this swill again come back as slow query. 当您的表开始命中符合那些WHERE条件的更多数据时，该请求将再次作为慢查询返回。 so for short term this might work but longer term you have to think about alternate options 因此从短期来看这可能有效，但从长远来看，您必须考虑其他选择

    1)  for example Indexed Views on top of this tables
    2)  Create same tables with different name and sync data 
        between new and original table using  “Insert/Update/Delete Trigger”.




    SELECT COUNT(1) AS [COUNT], 'DD' test_section_value  ,0 test_section
    FROM
    (
        SELECT  so.svc_order_id
        FROM    svc_order so
        WHERE   so.entered_date >= to_date('01/01/2012', 'MM/DD/YYYY')
                AND so.company_code = 'LL'

        INTERSECT

        SELECT  e.svc_order_id
        FROM    event e
        WHERE   e.event_type = 230
                AND e.event_level = 'O'
                AND e.current_sched_date BETWEEN
                    to_date('09/01/2010 00:00:00','MM/DD/YYYY HH24:MI:SS')
                    AND to_date('09/29/2013 23:59:59','MM/DD/YYYY HH24:MI:SS')
                AND ( 
                        (( so.sots_ta = 'N' ) AND ( so.action_type IS NULL OR so.action_type = 0))
                        OR 
                        (( so.sots_ta IS NULL ) AND ( so.action_type = 0 )) 
                        --or ((so.sots_ta = 'N') and (so.action_type is null))
                    )
    )qry1

Answer 4

First, ensure statistics are up-to-date. 首先，确保统计信息是最新的。

begin
    dbms_stats.gather_table_stats('[schema]', 'svc_order');
    dbms_stats.gather_table_stats('[schema]', 'event');
end;
/

This query is a very simple join between two small tables, but with complex predicates. 该查询是两个小表之间的非常简单的联接，但是具有复杂的谓词。 You almost certainly do not want to significantly re-write all of your queries in search of some magic syntax that will make everything run fast. 您几乎可以肯定不希望大量重写所有查询，以寻找可以使一切快速运行的魔术语法。 Yes, there are some rare cases where BETWEEN does not work well, or moving the predicates into an inline view helps, or replacing the join with an INTERSECT might help. 是的，在少数情况下BETWEEN不能很好地工作，或者将谓词移到内联视图中会有所帮助，或者用INTERSECT替换INTERSECT可能会有所帮助。 But that sounds like cargo-cult programming to me. 但这听起来像是对我的狂热编程。 Ask yourself, why would those changes make any difference? 问自己， 为什么这些变化会有所作为？ If those types of changes always improved performance, why wouldn't Oracle just translate the queries internally? 如果这些类型的更改始终可以提高性能，那么Oracle为什么不仅仅在内部转换查询？

Normally, you should try to provide better information to the optimizer so it can make better decisions. 通常，您应该尝试向优化器提供更好的信息，以便它可以做出更好的决策。 Usually this is as simple as gathering statistics with the default settings. 通常，这就像使用默认设置收集统计信息一样简单。 Some predicates are just too complex, and for that you should try to use dynamic sampling , such as /*+ dynamic_sampling(6) */ . 有些谓词太复杂了，为此，您应该尝试使用动态采样，例如/*+ dynamic_sampling(6) */ 。 Or maybe add some histograms . 或者添加一些直方图。 Or perhaps add an expression statistic like this: 或添加这样的表达式统计信息：

SELECT 
    DBMS_STATS.CREATE_EXTENDED_STATS(null,'SVC_ORDER',
        '(((so.sots_ta = 'N') and (so.action_type = 0)) 
        or  ((so.sots_ta is null) and (so.action_type = 0)) 
        or  ((so.sots_ta = 'N') and (so.action_type is null)))'
    ) 
FROM DUAL;
--Don't forget to re-gather statistics after this.

The optimizer is probably under-estimating the number of rows, and using a nested loop instead of a hash join. 优化器可能会低估行数，并使用嵌套循环而不是哈希联接。 After providing it with more information, ideally it will start using a hash join. 向其提供更多信息后，理想情况下它将开始使用哈希联接。 But at some point, after you've tried the above methods and possibly many other features, you can just tell it what kind of join to use. 但是在某些时候，在尝试了上述方法以及可能的许多其他功能之后，您只需告诉它要使用哪种连接即可。 Which would be @Florin Ghita's suggestion, /*+use_hash(so e)*/ . 这就是@Florin Ghita的建议， /*+use_hash(so e)*/ 。

如何减少海量数据表的查询执行时间

问题描述

4 个解决方案

解决方案1
1 已采纳 2013-10-04 08:38:18

解决方案2
0 2013-10-02 12:46:05

解决方案3
0 2013-10-02 13:49:29

解决方案4
0 2013-10-02 18:17:48

如何减少海量数据表的查询执行时间

问题描述

4 个解决方案

解决方案1 1 已采纳 2013-10-04 08:38:18

解决方案2 0 2013-10-02 12:46:05

解决方案3 0 2013-10-02 13:49:29

解决方案4 0 2013-10-02 18:17:48

解决方案1
1 已采纳 2013-10-04 08:38:18

解决方案2
0 2013-10-02 12:46:05

解决方案3
0 2013-10-02 13:49:29

解决方案4
0 2013-10-02 18:17:48