简体   繁体   中英

Why does checking for null slow this query down?

I got this table containing 7,000 records

desc ARADMIN.V_PKGXMLCODE

Name                  Null     Type          
--------------------- -------- ------------- 
REQUEST_ID            NOT NULL VARCHAR2(15)  
AVAILABILITY                   VARCHAR2(69)  
XML_CODE                       CLOB          
PACKAGENAME_UNIQUE             VARCHAR2(50)  
CATALOG                        NUMBER(15)    
CHILD                          VARCHAR2(255) 
CLASSIFICATION_SYSTEM          NUMBER(15)    
E_MAIL                         VARCHAR2(69)

The query

SELECT COUNT(*) FROM ARADMIN.V_PKGXMLCODE WHERE (CATALOG <> 0 AND CATALOG <> 2) AND (NOT (CHILD IS NULL));

takes less than one second.

The query

SELECT COUNT(*) FROM ARADMIN.V_PKGXMLCODE WHERE (CATALOG IS NULL OR (CATALOG <> 0 AND CATALOG <> 2)) AND (NOT (CHILD IS NULL));

takes 23 seconds.

Explain plan however claims it should go real quick...

在此处输入图片说明

What can I do?

The only way I can think to get that kind of difference in execution speed would be to (a) have an index on field4 , and (b) have a lot of empty data blocks; possibly from a high water mark set very high by repeated direct-path loads.

The first query would still use the index and perform as expected. But as null values are not indexed, the index cannot be used to check the or field4 is null condition, so it would fall back to a full table scan.

That in itself shouldn't be a problem here, as a full table scan of 7000 rows shouldn't take long. But since it is taking so long, something else is going on. A full table scan has to examine every data block allocated to the table to see if they contain any rows, and the time it's taking suggests there are a lot more blocks than you need to hold 7000 rows, even with inline CLOB storage.

The simplest way to get a lot of empty data blocks is to have a lot of data and then delete most of it. But I believe you said in a now-deleted comment on an earlier question that performance used to be OK and has got worse. That can happen if you do direct-path inserts , particularly if you 'refresh' data by deleting it and then inserting new data in direct-path mode. You could be doing that with inserts that have the /*+ append */ hint; or in parallel; or through SQL*Loader. Each time you did that the high water mark would move, as old empty blocks wouldn't be reused; and each time performance of the query that checks for nulls would degrade a little. After a lot of iterations that would really start to add up.

You can check the data dictionary to see how much space is allocated to your table ( user_segments etc.), and compare that to the size of the data you think you actually have. You can reset the HWM by rebuilding the table, eg by doing:

alter table mytable move;

(preferably in a maintenance window!)

As a demo I ran a cycle to direct-path insert and delete 7000 rows over a hundred times, and then ran both your queries. The first one took 0.06 seconds (much of which is SQL Devleoper overhead); the second took 1.260. (I also ran Gordon's, which got a similar time, as it still has to do a FTS). With more iterations the difference would become even more marked, but I ran out of space... I then did an alter table move and re-ran your second query, which then took 0.05 seconds.

That is interesting. I would expect the query to have the same performance, because Oracle has a good optimizer and shouldn't be confused by the NULL .

How does this version have better performance?

select x1.cnt + x2.cnt + x3.cnt
from (select count(*) as cnt
      from MYTABLE
      where field4 = 1 and child is not null
     ) x1 cross join
     (select count(*) as cnt
      from MYTABLE
      where field4 = 4 and child is not null
     ) x2 cross join
     (select count(*) as cnt
      from MYTABLE
      where field4 is null and child is not null
     ) x3;

This version should be able to take advantage of an index on MYTABLE(field4, child) .

I was actually struggling with a similar issue. I had a condition where I needed to filter out all the NULL values from my query.

I started with:

ColumnName IS NOT NULL

This increased my query time manifolds, I tried multiple things after this, like functions where I would just return what I needed, although that was not working as well. Finally a small change did the trick, what I did was:

IsNull(ColumnName,'') <> ''

And it worked, I am not entirely sure what is the difference, although it worked.

IS NULL does not work with count. You get the error "Incorrect parameter count in the call to native function 'ISNULL'"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM