简体   繁体   中英

Oracle SQL how to optimize IN uncorrelated subquery

I have the following query which performs badly:

select 
    distinct 
    u.uuid
    u.user_name,
    u.key

    from request req 
         join int_user u on u.uuid = req.user_uuid
         join int_right r on r.uuid = req.right_uuid

    where r.uuid in (
            select r2.uuid from int_right r2
                    where 
                            (
                                lower(r2.right_name) like '%keyword%'
                                or lower(r2.right_key) like '%keyword%'
                            )

                    )

The sub-query is uncorrelated and it will usually return few rows, sometimes only one row. Now I don't understand why if I take the sub-query and execute it separately then take the result list and add it statically with IN operator to outer query then it will perform very well, from 3-6s execution time down to 0.05s.

r.uuid in ('value1', 'value2', 'value3')

How could I tell oracle to execute my sub-query first then apply the result set to outer query?

Few notes:

  • Request tables is huge - around 7mln rows
  • Int_right table - around 10K rows
  • Int_user table - around 100K rows

From execution plan it seems that oracle does a full scan in all tables. The cost and cardinality is very big on request table. It's also interesting that even if my sub-query would return a single row for a certain search criteria the query is still slow, however if I would replace IN operator with equals(=) then the query becomes very fast and the cost low. It also appears that in this case oracle would only do a full scan in int_right table and for other table it would do a unique or range scan.

I also tried other variants of this query, like adding the conditions directly to outer query, use exists or correlated sub-query but it's still slow in any case.

Why do you need a sub-query ?

The same conditions can be applied in 2 different ways:

In Join

select 
    distinct 
    u.uuid
    u.user_name,
    u.key

    from request req 
         join int_user u on u.uuid = req.user_uuid
         join int_right r on r.uuid = req.right_uuid 
         And (lower(r.right_name) like '%keyword%' or lower(r.right_key) like '%keyword%')

In Where

select 
    distinct 
    u.uuid
    u.user_name,
    u.key

    from request req 
         join int_user u on u.uuid = req.user_uuid
         join int_right r on r.uuid = req.right_uuid 
    Where (lower(r.right_name) like '%keyword%' or lower(r.right_key) like '%keyword%')

Both will result in a much quicker query although I am not 100% sure which 1 will be faster. From my understanding the one in the join will be quicker...

You generally cannot tune a SQL statement by looking at the text (unless there is a fundamental flaw in the code, eg missing a join condition etc). For Oracle, one of the most productive ways to proceed is:

1) Execute the problematic statement with an additional hint as below

select /*+ gather_plan_statistics */ ... <rest of query>

2) Run the following to get execution plan metrics

select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'))

That way you will:

a) See the true execution plan that was used,

b) Get estimate/actual row source counts for each step in the plan. If the estimate vs actuals are way out, then that typically is where to focus your attention because this is where the optimizer most probably did not have sufficient or accurate enough information to work with.

For example

SQL> select /*+ gather_plan_statistics */ count(dname)
  2  from  scott.emp e, scott.dept d
  3  where e.sal <= 1500
  4  and  d.deptno = e.deptno;

COUNT(DNAME)
------------
           7

1 row selected.

SQL> select * from table(dbms_xplan.display_cursor(null,null,'ALLSTATS LAST'));

PLAN_TABLE_OUTPUT
---------------------------------------------------------------------------------------------------
SQL_ID  c1cb4s8b141h8, child number 0
-------------------------------------
select /*+ gather_plan_statistics */ count(dname) from  scott.emp e,
scott.dept d where e.sal <= 1500 and  d.deptno = e.deptno

Plan hash value: 3037575695

---------------------------------------------------------------------------------------------------
| Id  | Operation                     | Name    | Starts | E-Rows | A-Rows |   A-Time   | Buffers |
---------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT              |         |      1 |        |      1 |00:00:00.01 |       9 |
|   1 |  SORT AGGREGATE               |         |      1 |      1 |      1 |00:00:00.01 |       9 |
|   2 |   MERGE JOIN                  |         |      1 |      3 |      7 |00:00:00.01 |       9 |
|   3 |    TABLE ACCESS BY INDEX ROWID| DEPT    |      1 |      4 |      4 |00:00:00.01 |       2 |
|   4 |     INDEX FULL SCAN           | DEPT_PK |      1 |      4 |      4 |00:00:00.01 |       1 |
|*  5 |    SORT JOIN                  |         |      4 |      3 |      7 |00:00:00.01 |       7 |
|*  6 |     TABLE ACCESS FULL         | EMP     |      1 |      3 |      7 |00:00:00.01 |       7 |
---------------------------------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   5 - access("D"."DEPTNO"="E"."DEPTNO")
       filter("D"."DEPTNO"="E"."DEPTNO")
   6 - filter("E"."SAL"<=1500)

On line 6, you can that the optimizer estimated 3 rows but actually got back 7. Large discrepancies indicate areas to investigate.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM