简体   繁体   中英

Why does this query not use the index?

I encountered a strange behaviour of the Postgres optimizer on the following query:

select count(product0_.id) as col_0_0_ from Product product0_ 
 where product0_.active=true 
 and (product0_.aggregatorId is null 
 or product0_.aggregatorId in ($1 , $2 , $3))

Product has about 54 columns, active is a boolean having a btree index, and aggregatorId is 'varchar(15)` and has a btree index.

On this query above the index for 'aggregatorId' is not used:

Aggregate  (cost=169995.75..169995.76 rows=1 width=32) (actual time=3904.726..3904.727 rows=1 loops=1)
  ->  Seq Scan on product product0_  (cost=0.00..165510.39 rows=1794146 width=32) (actual time=0.055..2407.195 rows=1851827 loops=1)
        Filter: (active AND ((aggregatorid IS NULL) OR ((aggregatorid)::text = ANY ('{5109037,5001015,70601}'::text[]))))
        Rows Removed by Filter: 542146
Total runtime: 3904.925 ms

But if we reduce the query by leaving out the null check for this column, the index gets used:

Aggregate  (cost=17600.93..17600.94 rows=1 width=32) (actual time=614.933..614.935 rows=1 loops=1)
  ->  Index Scan using idx_prod_aggr on product product0_  (cost=0.43..17487.56 rows=45347 width=32) (actual time=19.284..594.509 rows=12099 loops=1)
      Index Cond: ((aggregatorid)::text = ANY ('{5109037,5001015,70601}'::text[]))
      Filter: active
    Rows Removed by Filter: 49130
Total runtime: 150.255 ms

As far as I know a btree index can handle null checks, so I don't understand why the index is not used for the complete query. The product table contains about 2.3 million entries, so it is not very fast.

EDIT: The index is very standard:

CREATE INDEX idx_prod_aggr
  ON product
  USING btree
  (aggregatorid COLLATE pg_catalog."default");

Your problem looked interesting, so I reproduced your scenario - postgres 9.1, table with 1M rows, one boolean column, one varchar column, both indexed, half of table has NULL names.

I had same explain analyze output when varchar column was not indexed. However, with index postgres uses bitmap scan on NULL condition and IN condition and then merges them with OR condition.

Then he uses seq scan on boolean condition (because indexes are separated)

explain analyze
select * from A where active is true and ((name is null) OR (name in ('1','2','3')  ));

See output:

"Bitmap Heap Scan on a  (cost=17.34..21.35 rows=1 width=18) (actual time=0.048..0.048 rows=0 loops=1)"
"  Recheck Cond: ((name IS NULL) OR ((name)::text = ANY ('{1,2,3}'::text[])))"
"  Filter: (active IS TRUE)"
"  ->  BitmapOr  (cost=17.34..17.34 rows=1 width=0) (actual time=0.047..0.047 rows=0 loops=1)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..4.41 rows=1 width=0) (actual time=0.010..0.010 rows=0 loops=1)"
"              Index Cond: (name IS NULL)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..12.93 rows=1 width=0) (actual time=0.036..0.036 rows=0 loops=1)"
"              Index Cond: ((name)::text = ANY ('{1,2,3}'::text[]))"
"Total runtime: 0.077 ms"

This makes me think that you missed some details, if so, add them to your question.

Since there are many identical values for the column which you use in the where clause (78% of all the table rows according to your numbers), the database will conclude that it is cheaper to use full table scan than to waste additional time to read the index.

The rule of thumb in most database vendors is that index will probably not be used if it can't narrow the search down to about 5% of all the table records.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM