為什么這個查詢不使用索引？

Question

我在以下查詢中遇到了Postgres優化器的奇怪行為：

select count(product0_.id) as col_0_0_ from Product product0_ 
 where product0_.active=true 
 and (product0_.aggregatorId is null 
 or product0_.aggregatorId in ($1 , $2 , $3))

Product有大約54列， active是一個有btree索引的boolean，而aggregatorId是'varchar（15）`並且有一個btree索引。

在上面的查詢中，不使用'aggregatorId'的索引：

Aggregate  (cost=169995.75..169995.76 rows=1 width=32) (actual time=3904.726..3904.727 rows=1 loops=1)
  ->  Seq Scan on product product0_  (cost=0.00..165510.39 rows=1794146 width=32) (actual time=0.055..2407.195 rows=1851827 loops=1)
        Filter: (active AND ((aggregatorid IS NULL) OR ((aggregatorid)::text = ANY ('{5109037,5001015,70601}'::text[]))))
        Rows Removed by Filter: 542146
Total runtime: 3904.925 ms

但是如果我們通過省略對該列的空檢查來減少查詢，則使用索引：

Aggregate  (cost=17600.93..17600.94 rows=1 width=32) (actual time=614.933..614.935 rows=1 loops=1)
  ->  Index Scan using idx_prod_aggr on product product0_  (cost=0.43..17487.56 rows=45347 width=32) (actual time=19.284..594.509 rows=12099 loops=1)
      Index Cond: ((aggregatorid)::text = ANY ('{5109037,5001015,70601}'::text[]))
      Filter: active
    Rows Removed by Filter: 49130
Total runtime: 150.255 ms

據我所知，btree索引可以處理空值檢查，所以我不明白為什么索引不用於完整查詢。 產品表包含大約230萬個條目，因此速度不是很快。

編輯：該指數非常標准：

CREATE INDEX idx_prod_aggr
  ON product
  USING btree
  (aggregatorid COLLATE pg_catalog."default");

Answer 1

您的問題看起來很有趣，所以我重現了您的場景 - postgres 9.1，包含1M行的表，一個布爾列，一個varchar列，兩個都已編入索引，一半表具有NULL名稱。

當varchar列未編入索引時，我有相同的解釋分析輸出。 但是，索引postgres在NULL條件和IN條件下使用位圖掃描，然后將它們與OR條件合並。

然后他在布爾條件下使用seq掃描（因為索引是分開的）

explain analyze
select * from A where active is true and ((name is null) OR (name in ('1','2','3')  ));

見輸出：

"Bitmap Heap Scan on a  (cost=17.34..21.35 rows=1 width=18) (actual time=0.048..0.048 rows=0 loops=1)"
"  Recheck Cond: ((name IS NULL) OR ((name)::text = ANY ('{1,2,3}'::text[])))"
"  Filter: (active IS TRUE)"
"  ->  BitmapOr  (cost=17.34..17.34 rows=1 width=0) (actual time=0.047..0.047 rows=0 loops=1)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..4.41 rows=1 width=0) (actual time=0.010..0.010 rows=0 loops=1)"
"              Index Cond: (name IS NULL)"
"        ->  Bitmap Index Scan on idx_prod_aggr  (cost=0.00..12.93 rows=1 width=0) (actual time=0.036..0.036 rows=0 loops=1)"
"              Index Cond: ((name)::text = ANY ('{1,2,3}'::text[]))"
"Total runtime: 0.077 ms"

這讓我覺得您錯過了一些細節，如果是這樣，請將它們添加到您的問題中。

Answer 2

由於您在where子句中使用的列有許多相同的值（根據您的數字，所有表行的78％），數據庫將得出結論，使用全表掃描比浪費額外的時間讀取更便宜指數。

大多數數據庫供應商的經驗法則是，如果無法將搜索范圍縮小到所有表記錄的約5％，則可能不會使用該索引。

為什么這個查詢不使用索引？

問題描述

2 個解決方案

解決方案1
1 2015-06-11 10:07:40

解決方案2
1 已采納 2015-06-11 10:20:59

為什么這個查詢不使用索引？

問題描述

2 個解決方案

解決方案1 1 2015-06-11 10:07:40

解決方案2 1 已采納 2015-06-11 10:20:59

解決方案1
1 2015-06-11 10:07:40

解決方案2
1 已采納 2015-06-11 10:20:59