简体   繁体   English

为什么 GIN 三元组索引不与 ILIKE ANY 子句一起使用?

[英]Why is GIN trigram index not being used with ILIKE ANY clause?

I have a table with the following definition:我有一个具有以下定义的表:

CREATE TABLE clients
        (
            "id" SERIAL PRIMARY KEY,
            "email" TEXT NOT NULL,
            "first_name" TEXT NOT NULL,
            "last_name" TEXT NOT NULL,
            "telephone" TEXT,
            "city" TEXT NOT NULL,
            "street" TEXT NOT NULL,
            "house" TEXT NOT NULL,
            "apartment" TEXT,
            "created_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
            "updated_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
        );

I want to search by partial overlap of any words in a query with any of most of the fields in the table.我想通过查询中任何单词与表中大多数字段的部分重叠进行搜索。 I have an index which I intent to use to speed up my search:我有一个索引,我打算用它来加速我的搜索:

create index "clients_text_search_idx" on clients
    using gin ((
        first_name || ' ' ||
        last_name || ' ' ||
        coalesce(telephone, '') || ' ' ||
        email || ' ' ||
        city || ' ' ||
        street || ' ' ||
        house || ' ' ||
        coalesce(apartment, '')
        ) gin_trgm_ops);

I fill my table up with realistic fake data, creating more than 100 000 rows.我用真实的假数据填满了我的表格,创建了超过 100 000 行。 Then I want to see my index being used with a following query, using the exact same expression that I did when creating my index:然后我想看到我的索引与以下查询一起使用,使用与创建索引时完全相同的表达式:

explain analyse select * from clients where (
        first_name || ' ' ||
        last_name || ' ' ||
        coalesce(telephone, '') || ' ' ||
        email || ' ' ||
        city || ' ' ||
        street || ' ' ||
        house || ' ' ||
        coalesce(apartment, '')
        ) ilike any (
    select '%' || word || '%'
    from regexp_split_to_table('+123 georg', E'\\s+') AS word); -- same as: ilike any (values ('%+123%'), ('%georg%'))

However what I see is simple sequential scan (even if I use set enable_seqscan = false ):然而,我看到的是简单的顺序扫描(即使我使用set enable_seqscan = false ):

Gather  (cost=1000.00..4027885.59 rows=550 width=135) (actual time=2.083..4542.739 rows=166 loops=1)
  Workers Planned: 1
  Workers Launched: 1
  ->  Nested Loop Semi Join  (cost=0.00..4026830.59 rows=324 width=135) (actual time=14.319..4503.745 rows=83 loops=2)
"        Join Filter: (((((((((((((((clients.first_name || ' '::text) || clients.last_name) || ' '::text) || COALESCE(clients.telephone, ''::text)) || ' '::text) || clients.email) || ' '::text) || clients.city) || ' '::text) || clients.street) || ' '::text) || clients.house) || ' '::text) || COALESCE(clients.apartment, ''::text)) ~~* (('%'::text || word.word) || '%'::text))"
        Rows Removed by Join Filter: 109928
        ->  Parallel Seq Scan on clients  (cost=0.00..2540.12 rows=64712 width=135) (actual time=0.032..533.627 rows=55006 loops=2)
        ->  Function Scan on regexp_split_to_table word  (cost=0.00..10.00 rows=1000 width=32) (actual time=0.009..0.028 rows=2 loops=110011)
Planning Time: 3.167 ms
Execution Time: 8.750 ms

But if I replace the whole ilike any ... clause of the previous query with something simple like ilike '%george%' , the index is getting used and the query gets executed super fast.但是,如果我用ilike '%george%'类的简单内容替换上一个查询的整个ilike any ...子句,则索引正在被使用并且查询的执行速度非常快。 So why is my index not being used with ilike any clause?那么为什么我的索引没有与ilike any子句一起使用?

I use PostgreSQL 11.2 on MacOS Mojave.我在 MacOS Mojave 上使用 PostgreSQL 11.2。

I have to admit I don't understand why this is happening, but I was able to get it working by making the right-hand side of the ANY an array instead of a subquery using the ARRAY(<subselect>) form .我不得不承认我不明白为什么会发生这种情况,但是我能够通过使用ARRAY(<subselect>)表单ANY的右侧设为数组而不是子查询来使其工作。

testdb=# explain analyse select * from clients where (
        first_name || ' ' ||
        last_name || ' ' ||
        coalesce(telephone, '') || ' ' ||
        email || ' ' ||
        city || ' ' ||
        street || ' ' ||
        house || ' ' ||
        coalesce(apartment, '')
        ) ilike any (array(
    select ('%' || word || '%')::text
    from regexp_split_to_table('foo georg', E'\\s+') AS word));
                                                                                                                                       QUERY PLAN                                                                                                                                        
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on clients  (cost=126.79..238.91 rows=489 width=62) (actual time=2.108..18.721 rows=10000 loops=1)
   Recheck Cond: (((((((((((((((first_name || ' '::text) || last_name) || ' '::text) || COALESCE(telephone, ''::text)) || ' '::text) || email) || ' '::text) || city) || ' '::text) || street) || ' '::text) || house) || ' '::text) || COALESCE(apartment, ''::text)) ~~* ANY ($0))
   Heap Blocks: exact=84
   InitPlan 1 (returns $0)
     ->  Function Scan on regexp_split_to_table word  (cost=0.00..15.00 rows=1000 width=32) (actual time=0.043..0.045 rows=2 loops=1)
   ->  Bitmap Index Scan on clients_text_search_idx  (cost=0.00..111.67 rows=489 width=0) (actual time=2.060..2.060 rows=10000 loops=1)
         Index Cond: (((((((((((((((first_name || ' '::text) || last_name) || ' '::text) || COALESCE(telephone, ''::text)) || ' '::text) || email) || ' '::text) || city) || ' '::text) || street) || ' '::text) || house) || ' '::text) || COALESCE(apartment, ''::text)) ~~* ANY ($0))
 Planning time: 0.297 ms
 Execution time: 19.254 ms
(9 rows)

This is with enable_seqscan = off on pg 10.12 (all that I have available right now).这是在 pg 10.12 上enable_seqscan = off (我现在可用的所有内容)。

看起来您的查询以并行模式运行,但只有一个工作人员,这对我来说没有意义:它没有解释为什么不使用索引,但看起来很奇怪。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM