Oracle - Query running very slow

Question

I have a simple query which is running for ever. There is one date condition which once I remove, the query comes right back with results. Its a date field in the format '31-MAR-15'. I am not understanding why this condition makes the query so slow. Thanks in advance.

SELECT
  substr(a.id, 1, 2)   AS country,
  count(DISTINCT a.id) AS id_count,
  sum(a.amount)        AS amount
FROM table1 a
  JOIN table2 b ON a.id = b.id
  JOIN table3 c ON b.party_id = c.party_id
WHERE a.prod_type = 'INS'
  AND c.acct_type = 'LON'
  AND substr(a.id, 1, 2) = 'US'
  AND a.dump_dt = '31-MAR-15'
  AND substr(id, 4, 8) = '20150303'
GROUP BY substr(a.id, 1, 2);

Explain Plan:

PLAN_TABLE_OUTPUT
Plan hash value: 255044277

------------------------------------------------------------------------------------------------------------
| Id  | Operation                         | Name                   | Rows  | Bytes | Cost (%CPU)| Time     |
------------------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                  |                        |     1 |   121 |   125K  (1)| 00:25:08 |
|   1 |  HASH GROUP BY                    |                        |     1 |   121 |   125K  (1)| 00:25:08 |
|   2 |   VIEW                            | VW_DAG_0               |     1 |   121 |   125K  (1)| 00:25:08 |
|   3 |    HASH GROUP BY                  |                        |     1 |    98 |   125K  (1)| 00:25:08 |
|   4 |     NESTED LOOPS                  |                        |       |       |            |          |
|   5 |      NESTED LOOPS                 |                        |     1 |    98 |   125K  (1)| 00:25:08 |
|   6 |       MERGE JOIN CARTESIAN        |                        | 12613 |   800K| 21133   (2)| 00:04:14 |
|*  7 |        TABLE ACCESS BY INDEX ROWID| TABLE1                 |     1 |    45 |    46   (0)| 00:00:01 |
|*  8 |         INDEX RANGE SCAN          | DATA_DATE__STG_BACKUP2 |  1040 |       |     6   (0)| 00:00:01 |
|   9 |        BUFFER SORT                |                        |   182K|  3564K| 21087   (2)| 00:04:14 |
|* 10 |         TABLE ACCESS FULL         | TABLE3                 |   182K|  3564K| 21087   (2)| 00:04:14 |
|* 11 |       INDEX RANGE SCAN            | BSB_PARTYID_IDX        |    22 |       |     3   (0)| 00:00:01 |
|* 12 |      TABLE ACCESS BY INDEX ROWID  | TABLE2                 |     1 |    33 |    10   (0)| 00:00:01 |
------------------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   7-filter(SUBSTR(A.ID, 4, 8) = '20150303' AND SUBSTR(A.ID, 1, 2) = 'US'
              AND A.PROD_TYPE = 'INS')
   8 - access(A.DUMP_DT = '31-MAR-15')
  10 - filter(C.ACCT_TYPE = 'LON')
  11 – access(B.PARTY_ID = C.PARTY_ID)
  12 - filter(A.ID = B.ID)

Answer 1

Looks like the optimizer is significantly under-estimating the number of rows returned after applying these 4 predicates on TABLE1 .

A.PROD_TYPE = 'INS'
SUBSTR(A.ID, 1, 2) = 'US'
A.DUMP_DT = '31-MAR-15'
SUBSTR(ID, 4, 8) = '20150303'

(Slightly off-topic: it's safer to use an ANSI literal date '2015-03-31' instead of the implicitly-converted string '31-MAR-15' . And the statement has a few errors, like missing a condition between the first 2 predicates and missing the A. in front of the last predicate.)

First, make sure there are accurate statistics on all tables and see if that changes the explain plan:

begin
    dbms_stats.gather_table_stats(user, 'TABLE1');
    dbms_stats.gather_table_stats(user, 'TABLE2');
    dbms_stats.gather_table_stats(user, 'TABLE3');
end;
/

The "smart column", ID , makes it difficult to estimate the number of rows returned after applying conditions. If it's too late to change the data model you can at least provide Oracle with some extended statistics to help it deal with the predicates:

select dbms_stats.create_extended_stats(user, 'TABLE1', '(SUBSTR(ID, 1, 2))') from dual;
select dbms_stats.create_extended_stats(user, 'TABLE1', '(SUBSTR(ID, 4, 8))') from dual;

I'm guessing that SUBSTR(A.ID, 1, 2) = 'US' is a popular value, but without the extended statistics Oracle won't know that. The extra histogram may significantly increase the cardinality. Then the optimizer wouldn't choose the Cartesian join between two unrelated tables.

Answer 2

I've simplified condition in WHERE clause over A.ID field

A.ID LIKE 'US_20150303%'

has the same effect as

substr(a.id, 1, 2) = 'US' AND substr(id, 4, 8) = '20150303'

and, in case column A.ID was indexed, the fact of applying SUBSTR(a.ID,..) function makes the index useless.

On the other hand, a.dump_dt seems to be a DATE type column, so a preferred way to apply a filter on this column could be

a.dump_dt = TO_DATE('31-MAR-15', 'DD-MON-RR')

instead of

a.dump_dt = '31-MAR-15'

The latter depends primarily on NLS_DATE_FORMAT of the Oracle client that runs the query and in some cases could negativally affect the performance by ignoring the use of an index over a.dump_dt .

So the rewritten query looks like this:

SELECT
  SUBSTR(A.ID, 1, 2)   AS country,
  COUNT(DISTINCT A.ID) AS id_count,
  SUM(A.amount)        AS amount
FROM table1 A
  JOIN table2 b ON A.ID = b.ID
  JOIN table3 c ON b.party_id = c.party_id
WHERE A.prod_type = 'INS'
  AND c.acct_type = 'LON'
  AND A.ID LIKE 'US_20150303%'
  AND A.dump_dt = TO_DATE('31-MAR-15', 'DD-MON-RR')
GROUP BY SUBSTR(A.ID, 1, 2);

Answer 3

Try to use oracle hints to stabilize selection plan or you can use that trick:

....
And A.DUMP_DT+0 =  to_date('31-MAR-15','dd-mon-    rr')
...

Oracle - Query running very slow

Question

3 answers

solution1
1 2015-04-13 07:08:38

solution2
1 2016-07-01 00:55:44

solution3
-1 2015-04-13 13:52:17

Oracle - Query running very slow

Question

3 answers

solution1 1 2015-04-13 07:08:38

solution2 1 2016-07-01 00:55:44

solution3 -1 2015-04-13 13:52:17

solution1
1 2015-04-13 07:08:38

solution2
1 2016-07-01 00:55:44

solution3
-1 2015-04-13 13:52:17