为什么这个查询不使用索引？

Question

I encountered a strange behaviour of the Postgres optimizer on the following query: 我在以下查询中遇到了Postgres优化器的奇怪行为：

select count(product0_.id) as col_0_0_ from Product product0_ 
 where product0_.active=true 
 and (product0_.aggregatorId is null 
 or product0_.aggregatorId in ($1 , $2 , $3))

Product has about 54 columns, active is a boolean having a btree index, and aggregatorId is 'varchar(15)` and has a btree index. Product有大约54列， active是一个有btree索引的boolean，而aggregatorId是'varchar（15）`并且有一个btree索引。

On this query above the index for 'aggregatorId' is not used: 在上面的查询中，不使用'aggregatorId'的索引：

Aggregate  (cost=169995.75..169995.76 rows=1 width=32) (actual time=3904.726..3904.727 rows=1 loops=1)
  ->  Seq Scan on product product0_  (cost=0.00..165510.39 rows=1794146 width=32) (actual time=0.055..2407.195 rows=1851827 loops=1)
        Filter: (active AND ((aggregatorid IS NULL) OR ((aggregatorid)::text = ANY ('{5109037,5001015,70601}'::text[]))))
        Rows Removed by Filter: 542146
Total runtime: 3904.925 ms

But if we reduce the query by leaving out the null check for this column, the index gets used: 但是如果我们通过省略对该列的空检查来减少查询，则使用索引：

Aggregate  (cost=17600.93..17600.94 rows=1 width=32) (actual time=614.933..614.935 rows=1 loops=1)
  ->  Index Scan using idx_prod_aggr on product product0_  (cost=0.43..17487.56 rows=45347 width=32) (actual time=19.284..594.509 rows=12099 loops=1)
      Index Cond: ((aggregatorid)::text = ANY ('{5109037,5001015,70601}'::text[]))
      Filter: active
    Rows Removed by Filter: 49130
Total runtime: 150.255 ms

As far as I know a btree index can handle null checks, so I don't understand why the index is not used for the complete query. 据我所知，btree索引可以处理空值检查，所以我不明白为什么索引不用于完整查询。 The product table contains about 2.3 million entries, so it is not very fast. 产品表包含大约230万个条目，因此速度不是很快。

EDIT: The index is very standard: 编辑：该指数非常标准：

CREATE INDEX idx_prod_aggr
  ON product
  USING btree
  (aggregatorid COLLATE pg_catalog."default");

Answer 1

Your problem looked interesting, so I reproduced your scenario - postgres 9.1, table with 1M rows, one boolean column, one varchar column, both indexed, half of table has NULL names. 您的问题看起来很有趣，所以我重现了您的场景 - postgres 9.1，包含1M行的表，一个布尔列，一个varchar列，两个都已编入索引，一半表具有NULL名称。

I had same explain analyze output when varchar column was not indexed. 当varchar列未编入索引时，我有相同的解释分析输出。 However, with index postgres uses bitmap scan on NULL condition and IN condition and then merges them with OR condition. 但是，索引postgres在NULL条件和IN条件下使用位图扫描，然后将它们与OR条件合并。

Then he uses seq scan on boolean condition (because indexes are separated) 然后他在布尔条件下使用seq扫描（因为索引是分开的）

explain analyze
select * from A where active is true and ((name is null) OR (name in ('1','2','3')  ));

See output: 见输出：

"Bitmap Heap Scan on a  (cost=17.34..21.35 rows=1 width=18) (actual time=0.048..0.048 rows=0 loops=1)"
"  Recheck Cond: ((name IS NULL) OR ((name)::text = ANY ('{1,2,3}'::text[])))"
"  Filter: (active IS TRUE)"
"  ->  BitmapOr  (cost=17.34..17.34 rows=1 width=0) (actual time=0.047..0.047 rows=0 loops=1)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..4.41 rows=1 width=0) (actual time=0.010..0.010 rows=0 loops=1)"
"              Index Cond: (name IS NULL)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..12.93 rows=1 width=0) (actual time=0.036..0.036 rows=0 loops=1)"
"              Index Cond: ((name)::text = ANY ('{1,2,3}'::text[]))"
"Total runtime: 0.077 ms"

This makes me think that you missed some details, if so, add them to your question. 这让我觉得您错过了一些细节，如果是这样，请将它们添加到您的问题中。

Answer 2

Since there are many identical values for the column which you use in the where clause (78% of all the table rows according to your numbers), the database will conclude that it is cheaper to use full table scan than to waste additional time to read the index. 由于您在where子句中使用的列有许多相同的值（根据您的数字，所有表行的78％），数据库将得出结论，使用全表扫描比浪费额外的时间读取更便宜指数。

The rule of thumb in most database vendors is that index will probably not be used if it can't narrow the search down to about 5% of all the table records. 大多数数据库供应商的经验法则是，如果无法将搜索范围缩小到所有表记录的约5％，则可能不会使用该索引。

为什么这个查询不使用索引？

问题描述

2 个解决方案

解决方案1
1 2015-06-11 10:07:40

解决方案2
1 已采纳 2015-06-11 10:20:59

为什么这个查询不使用索引？

问题描述

2 个解决方案

解决方案1 1 2015-06-11 10:07:40

解决方案2 1 已采纳 2015-06-11 10:20:59

解决方案1
1 2015-06-11 10:07:40

解决方案2
1 已采纳 2015-06-11 10:20:59