POSTGRESQL：如何針對substring列優化索引？

Question

如何優化列的 substring 的索引？

例如，有一列 postal_code 存儲 5 個字符的字符串。 如果我的大部分查詢都對前 2 個字符進行過濾，則在此列上有索引是沒有用的。

如果我只在 substring 上創建索引怎么辦： CREATE INDEX ON index.annonces_parsed (left(postal_code, 2))

這是一個好的解決方案，還是添加一個僅存儲 substring 並在其上有索引的新列更好？

使用該索引的查詢可以是：

select *
from index.cities
where left(postal_code, 2) = '83' --- Will it use the index on the substring ?

非常感謝

Answer 1

我有一個有 2000 萬條記錄的test表。

測試 1

CREATE INDEX test_a1_idx ON test (a1)

explain analyze 
select * from test 
where left(a1, 2) = '58'

Gather  (cost=1000.00..103565.05 rows=40000 width=12) (actual time=0.429..468.428 rows=89712 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on test  (cost=0.00..98565.05 rows=16667 width=12) (actual time=0.114..407.330 rows=29904 loops=3)
        Filter: ("left"(a1, 2) = '58'::text)
        Rows Removed by Filter: 2636765
Planning Time: 0.424 ms
Execution Time: 470.472 ms


explain analyze 
select * from test 
where a1 like '58%'

Gather  (cost=1000.00..99284.01 rows=80523 width=12) (actual time=0.990..337.339 rows=89712 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on test  (cost=0.00..90231.71 rows=33551 width=12) (actual time=0.233..278.740 rows=29904 loops=3)
        Filter: (a1 ~~ '58%'::text)
        Rows Removed by Filter: 2636765
Planning Time: 0.092 ms
Execution Time: 339.259 ms

測試 2

CREATE INDEX test_a1_idx1 ON test (left(a1, 2))

explain analyze 
select * from test 
where left(a1, 2) = '58'

Bitmap Heap Scan on test  (cost=446.43..49455.46 rows=40000 width=12) (actual time=10.507..206.800 rows=89712 loops=1)
  Recheck Cond: ("left"(a1, 2) = '58'::text)
  Heap Blocks: exact=38298
  ->  Bitmap Index Scan on test_a1_idx1  (cost=0.00..436.43 rows=40000 width=0) (actual time=5.450..5.450 rows=89712 loops=1)
        Index Cond: ("left"(a1, 2) = '58'::text)
Planning Time: 0.501 ms
Execution Time: 209.217 ms

explain analyze 
select * from test 
where a1 like '58%'

Gather  (cost=1000.00..99284.01 rows=80523 width=12) (actual time=0.341..334.759 rows=89712 loops=1)
  Workers Planned: 2
  Workers Launched: 2
  ->  Parallel Seq Scan on test  (cost=0.00..90231.71 rows=33551 width=12) (actual time=0.110..287.313 rows=29904 loops=3)
        Filter: (a1 ~~ '58%'::text)
        Rows Removed by Filter: 2636765
Planning Time: 0.067 ms
Execution Time: 336.762 ms

結果：

需要注意的是，當我們在conditions中使用任意function時，DB並沒有使用索引。 出於這個原因，函數索引為這些情況提供了非常好的性能。

POSTGRESQL：如何針對substring列優化索引？

問題描述

1 個解決方案

解決方案1
1 2022-10-08 14:03:45

POSTGRESQL：如何針對substring列優化索引？

問題描述

1 個解決方案

解決方案1 1 2022-10-08 14:03:45

解決方案1
1 2022-10-08 14:03:45