简体   繁体   中英

column order in sql where clause

All

  1. Is there any efficient matter of the column order in the sql where clause? I searched the web, then found some suggestion that all said first is the join then other filter. for example,

     select t1.a, t2.b from table1 t1, table2 t2 where t1.a = t2.a and t1 > 1 and t2 > 2; 

is there any question if I change t1 > 1 to the front of t1.a = t2.a ?

  1. I want to know the column order in exists clause. for example,

     select t1.a, t2.b from table1, table2 t2 where t1.a = t2.a and not exists (select 1 from table3 t3 where t1.a = t3.a and t1.b = t3.b) 

I wound if there is any effect if I change the t1.b = t3.b to t3.b = t1.b ? Is the column order is critical?

  1. Can anyone explain the sql select execute step, and give some useful tutorial link?

To get rid of this, use the explicit ANSI SQL-92 syntax using the JOIN keyword instead:

select t1.a, t2.b 
from table1 t1
inner join table2 t2 on t1.a = t2.a
where  t1 > 1 and t2 > 2;

This way, you will no longer has the join conditions in the WHERE clause, and you can easily add more conditions.

And for your second query, you can use OUTER JOIN for this:

select t1.a, t2.b 
from table1
inner join table2 t2 on t1.a = t2.a
LEFT JOIN table3 t3 on t1.a = t3.a and t1.b = t3.b
where t3.a IS NULL;

I know that this is not answering your question, but this old join syntax is the problem of this all vague in the conditions in the where clause. It is not recommended, So try to avoid it:

The people saying the order of predicates in the WHERE clause does not affect performance are right most of the time . However it remains the case that we often know more about the data in our system, especially its distribution and skew, than the optimizer can figure out from its statistics. In such situations it is important for us to give the optimizer the best information we can.

Here is an example abstracted from a real-life situation I encountered last week (I haven't got time to run up a test case right now but I will later).

The situation: three tables, a parent and two children. The task is to select unique rows in the parent by looking up non-unique rows in its children. All tables have many millions of rows.

My first attempt was something like this:

select parent.*
from   child1
       , child2
       , parent
where child1.col_a = 'whatever'
and child2.col_n = 9999
and child1.parent_id = child2.parent_id
and parent.id = child2.parent_id

The query returened the correct result set but the performance was pretty poor. The explain plan showed that the query was driving off CHILD2. This was wrong, because the filter on CHILD1 was much more selective. So I re-wrote the query like this:

select parent.*
from   child1
       , child2
       , parent
where child1.col_a = 'whatever'
and child2.col_n = 9999
and child2.parent_id = child1.parent_id
and parent.id = child1.parent_id

Now the query drove off CHILD1 and the performance improved by more than an order of magnitude. That won't happen all the time but I say fiddling with the WHERE clause is still a valid tuning technique.

The CBO is a very smart piece of software. Nevertheless we can't just throw a higgledy-piggledy WHERE clause at it and expect it to produce the best explain plan every time. The more complicated the query ( where complexity is defined by the number of joins ) the more important it is that we organize the WHERE clause in a meaningful fashion. It doesn't do any harm and it might just do some good.


By the way, those people saying that the ANSI join syntax doesn't affect performance again are only right most of the time . Occasionally it does lead the optimzer to produce inferior execution plans. Find out more .

The ordering of predicates in the where clause, under the Cost based optimizer (CBO), is not important, as the CBO will happily rearrange the predicates as it sees fit (unless you've stuck a hint in there to tell oracle not to do this).

It was true for the rule based optimizer that you should pay attention to the ordering of predicates, but hopefully you're not on a very old version of Oracle that still uses the RBO.

The order in the where clause is not important. The ANSI JOIN syntax improves readability, but it doesn't affect performance. To prove the point:

create table t1 (a number, x number);
create table t2 (a number, b number);
insert into t1 (a, x) select level, mod(level,100) from dual connect by level <= 100000;
insert into t2 (a, b) select level, mod(level,10)  from dual connect by level <= 100000;
exec dbms_stats.gather_table_stats(user,'t1');
exec dbms_stats.gather_table_stats(user,'t2');
set autotrace trace explain

All four queries

select t1.a, t2.b from t1, t2 where t1.a=t2.b and t1.x > 5 and t2.b > 5;
select t1.a, t2.b from t1, t2 where t1.a=t2.b and t2.b > 5 and t1.x > 5;
select t1.a, t2.b from t1, t2 where t1.x > 5 and t2.b > 5 and t1.a=t2.b;
select t1.a, t2.b from t1 join t2 on t1.a=t2.b where t1.x > 5 and t2.b > 5;

produce exactly the same query plan

Plan hash value: 282751716
---------------------------------------------------------------------------
| Id  | Operation          | Name | Rows  | Bytes | Cost (%CPU)| Time     |
---------------------------------------------------------------------------
|   0 | SELECT STATEMENT   |      | 44444 |   434K|    88   (7)| 00:00:02 |
|*  1 |  HASH JOIN         |      | 44444 |   434K|    88   (7)| 00:00:02 |
|*  2 |   TABLE ACCESS FULL| T2   | 44444 |   130K|    42   (5)| 00:00:01 |
|*  3 |   TABLE ACCESS FULL| T1   | 94946 |   649K|    44   (5)| 00:00:01 |
---------------------------------------------------------------------------

Predicate Information (identified by operation id):
---------------------------------------------------

   1 - access("T1"."A"="T2"."B")
   2 - filter("T2"."B">5)
   3 - filter("T1"."X">5 AND "T1"."A">5)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM