[英]Why is GIN trigram index not being used with ILIKE ANY clause?
I have a table with the following definition:我有一个具有以下定义的表:
CREATE TABLE clients
(
"id" SERIAL PRIMARY KEY,
"email" TEXT NOT NULL,
"first_name" TEXT NOT NULL,
"last_name" TEXT NOT NULL,
"telephone" TEXT,
"city" TEXT NOT NULL,
"street" TEXT NOT NULL,
"house" TEXT NOT NULL,
"apartment" TEXT,
"created_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW(),
"updated_at" TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT NOW()
);
I want to search by partial overlap of any words in a query with any of most of the fields in the table.我想通过查询中任何单词与表中大多数字段的部分重叠进行搜索。 I have an index which I intent to use to speed up my search:我有一个索引,我打算用它来加速我的搜索:
create index "clients_text_search_idx" on clients
using gin ((
first_name || ' ' ||
last_name || ' ' ||
coalesce(telephone, '') || ' ' ||
email || ' ' ||
city || ' ' ||
street || ' ' ||
house || ' ' ||
coalesce(apartment, '')
) gin_trgm_ops);
I fill my table up with realistic fake data, creating more than 100 000 rows.我用真实的假数据填满了我的表格,创建了超过 100 000 行。 Then I want to see my index being used with a following query, using the exact same expression that I did when creating my index:然后我想看到我的索引与以下查询一起使用,使用与创建索引时完全相同的表达式:
explain analyse select * from clients where (
first_name || ' ' ||
last_name || ' ' ||
coalesce(telephone, '') || ' ' ||
email || ' ' ||
city || ' ' ||
street || ' ' ||
house || ' ' ||
coalesce(apartment, '')
) ilike any (
select '%' || word || '%'
from regexp_split_to_table('+123 georg', E'\\s+') AS word); -- same as: ilike any (values ('%+123%'), ('%georg%'))
However what I see is simple sequential scan (even if I use set enable_seqscan = false
):然而,我看到的是简单的顺序扫描(即使我使用set enable_seqscan = false
):
Gather (cost=1000.00..4027885.59 rows=550 width=135) (actual time=2.083..4542.739 rows=166 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Nested Loop Semi Join (cost=0.00..4026830.59 rows=324 width=135) (actual time=14.319..4503.745 rows=83 loops=2)
" Join Filter: (((((((((((((((clients.first_name || ' '::text) || clients.last_name) || ' '::text) || COALESCE(clients.telephone, ''::text)) || ' '::text) || clients.email) || ' '::text) || clients.city) || ' '::text) || clients.street) || ' '::text) || clients.house) || ' '::text) || COALESCE(clients.apartment, ''::text)) ~~* (('%'::text || word.word) || '%'::text))"
Rows Removed by Join Filter: 109928
-> Parallel Seq Scan on clients (cost=0.00..2540.12 rows=64712 width=135) (actual time=0.032..533.627 rows=55006 loops=2)
-> Function Scan on regexp_split_to_table word (cost=0.00..10.00 rows=1000 width=32) (actual time=0.009..0.028 rows=2 loops=110011)
Planning Time: 3.167 ms
Execution Time: 8.750 ms
But if I replace the whole ilike any ...
clause of the previous query with something simple like ilike '%george%'
, the index is getting used and the query gets executed super fast.但是,如果我用ilike '%george%'
类的简单内容替换上一个查询的整个ilike any ...
子句,则索引正在被使用并且查询的执行速度非常快。 So why is my index not being used with ilike any
clause?那么为什么我的索引没有与ilike any
子句一起使用?
I use PostgreSQL 11.2 on MacOS Mojave.我在 MacOS Mojave 上使用 PostgreSQL 11.2。
I have to admit I don't understand why this is happening, but I was able to get it working by making the right-hand side of the ANY
an array instead of a subquery using the ARRAY(<subselect>)
form .我不得不承认我不明白为什么会发生这种情况,但是我能够通过使用ARRAY(<subselect>)
表单将ANY
的右侧设为数组而不是子查询来使其工作。
testdb=# explain analyse select * from clients where (
first_name || ' ' ||
last_name || ' ' ||
coalesce(telephone, '') || ' ' ||
email || ' ' ||
city || ' ' ||
street || ' ' ||
house || ' ' ||
coalesce(apartment, '')
) ilike any (array(
select ('%' || word || '%')::text
from regexp_split_to_table('foo georg', E'\\s+') AS word));
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on clients (cost=126.79..238.91 rows=489 width=62) (actual time=2.108..18.721 rows=10000 loops=1)
Recheck Cond: (((((((((((((((first_name || ' '::text) || last_name) || ' '::text) || COALESCE(telephone, ''::text)) || ' '::text) || email) || ' '::text) || city) || ' '::text) || street) || ' '::text) || house) || ' '::text) || COALESCE(apartment, ''::text)) ~~* ANY ($0))
Heap Blocks: exact=84
InitPlan 1 (returns $0)
-> Function Scan on regexp_split_to_table word (cost=0.00..15.00 rows=1000 width=32) (actual time=0.043..0.045 rows=2 loops=1)
-> Bitmap Index Scan on clients_text_search_idx (cost=0.00..111.67 rows=489 width=0) (actual time=2.060..2.060 rows=10000 loops=1)
Index Cond: (((((((((((((((first_name || ' '::text) || last_name) || ' '::text) || COALESCE(telephone, ''::text)) || ' '::text) || email) || ' '::text) || city) || ' '::text) || street) || ' '::text) || house) || ' '::text) || COALESCE(apartment, ''::text)) ~~* ANY ($0))
Planning time: 0.297 ms
Execution time: 19.254 ms
(9 rows)
This is with enable_seqscan = off
on pg 10.12 (all that I have available right now).这是在 pg 10.12 上enable_seqscan = off
(我现在可用的所有内容)。
看起来您的查询以并行模式运行,但只有一个工作人员,这对我来说没有意义:它没有解释为什么不使用索引,但看起来很奇怪。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.