简体   繁体   English

未使用Postgres的gin_trgm_ops索引

[英]Postgres `gin_trgm_ops` index not being used

I'm trying to speed up some text matching in Postgres, using the pg_trgm extensions: 我正在尝试使用pg_trgm扩展来加快 Postgres中的某些文本匹配:

CREATE TABLE test3 (id bigint, key text, value text);

insert into test3 values (1, 'first 1', 'second 3');
insert into test3 values (2, 'first 1', 'second 2');
insert into test3 values (2, 'first 2', 'second 3');
insert into test3 values (3, 'first 1', 'second 2');
insert into test3 values (3, 'first 1', 'second 3');
insert into test3 values (4, 'first 2', 'second 3');
insert into test3 values (4, 'first 2', 'second 3');
insert into test3 values (4, 'first 1', 'second 2');
insert into test3 values (4, 'first 1', 'second 2');

-- repeat the above 1,000,000x times, to have more rows for benchmarking
insert into test3(id, key, value) select id, key, value from test3 cross join generate_series(1, 1000000);

Now I query this table with ILIKE : 现在我用ILIKE查询该表:

select count(*) from test3 where key = 'first 1' and value ilike '%nd 3%';
Time: 918.265 ms

To see if indexing would speed this up, I added pg_trgm on both key and value columns: 为了查看索引是否可以加快速度,我在keyvalue列上都添加了pg_trgm

CREATE extension if not exists pg_trgm;
CREATE INDEX test3_key_trgm_idx ON test3 USING gin (key gin_trgm_ops);
CREATE INDEX test3_value_trgm_idx ON test3 USING gin (value gin_trgm_ops);

But the query still takes the same time, and EXPLAIN ANALYZE shows the indexes are not being used at all: 但是查询仍然需要相同的时间,并且EXPLAIN ANALYZE显示根本没有使用索引:

explain analyze select count(*) from test3 where key = 'first 1' and value ilike '%nd 3%';
                                                                 QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=126905.14..126905.15 rows=1 width=8) (actual time=1017.666..1017.667 rows=1 loops=1)
   ->  Gather  (cost=126904.93..126905.14 rows=2 width=8) (actual time=1017.505..1018.778 rows=3 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Partial Aggregate  (cost=125904.93..125904.94 rows=1 width=8) (actual time=1010.862..1010.862 rows=1 loops=3)
               ->  Parallel Seq Scan on test3  (cost=0.00..122427.06 rows=1391148 width=0) (actual time=0.041..973.550 rows=666667 loops=3)
                     Filter: ((value ~~* '%nd 3%'::text) AND (key = 'first 1'::text))
                     Rows Removed by Filter: 2333336
 Planning Time: 0.266 ms
 Execution Time: 1018.814 ms

Time: 1049.413 ms (00:01.049)

Note the sequential scan. 注意顺序扫描。 What gives? 是什么赋予了?

Never mind, I found the issue. 没关系,我发现了问题。

The query planner was smarter than my toy test set; 查询计划者比我的玩具测试集还聪明。 seeing as most rows match the query, it went for a sequential scan. 看到大多数行都与查询匹配,因此进行了顺序扫描。

If I try with ilike '%nd 0%' instead, no rows match and EXPLAIN ANALYZE reports Bitmap Index Scan on test3_value_trgm_idx correctly. 如果我尝试使用ilike '%nd 0%' ,则没有行匹配,并且EXPLAIN ANALYZE正确报告Bitmap Index Scan on test3_value_trgm_idx

So, normalizing the original JSONB in this manner works. 因此,以这种方式规范化原始JSONB是可行的。 But I'll also try to find and compare another way, using regular expressions over TEXT , to avoid having to create and maintain another table. 但是,我还将尝试查找和比较另一种方法,即通过TEXT使用正则表达式,以避免必须创建和维护另一个表。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM