[英]Postgres pg_trgm GIN index ignored in a specific join
I have a table item
with multiple text fields, like name
, unique_attr
, category
, etc, and all of them I've indexed using the GIN (gin_trgm_ops) index for faster ilike
queries, and indeed, even with a join to a table inventory_membership
the indexes are used and speed up the execution time.我有一个包含多个文本字段的表
item
,例如name
、 unique_attr
、 category
等,所有这些我都使用 GIN (gin_trgm_ops) 索引进行了索引,以实现更快的ilike
查询,实际上,即使连接到表inventory_membership
使用索引并加快执行时间。 Output of my explain: Output 我的解释:
explain analyze select i.* from item i
join inventory_membership im on im.inventory_id = i.inventory_id
where i.name ilike '%blu%' or unique_attr ilike '%blu%' or category ilike '%blu%'
or brand ilike '%blu%';
Hash Join (cost=98.64..4584.98 rows=87302 width=478) (actual time=4.258..30.393 rows=57584 loops=1)
Hash Cond: (i.inventory_id = im.inventory_id)
-> Bitmap Heap Scan on item i (cost=95.45..3584.23 rows=4982 width=478) (actual time=3.706..10.529 rows=3340 loops=1)
Recheck Cond: ((name ~~* '%blu%'::text) OR (unique_attr ~~* '%blu%'::text) OR (category ~~* '%blu%'::text) OR (brand ~~* '%blu%'::text))
Heap Blocks: exact=715
-> BitmapOr (cost=95.45..95.45 rows=5130 width=0) (actual time=3.622..3.622 rows=0 loops=1)
-> Bitmap Index Scan on item_name_idx (cost=0.00..42.97 rows=3596 width=0) (actual time=1.612..1.612 rows=3160 loops=1)
Index Cond: (name ~~* '%blu%'::text)
-> Bitmap Index Scan on item_unique_attr_idx (cost=0.00..12.01 rows=1 width=0) (actual time=0.586..0.586 rows=32 loops=1)
Index Cond: (unique_attr ~~* '%blu%'::text)
-> Bitmap Index Scan on item_category_idx (cost=0.00..22.78 rows=1437 width=0) (actual time=0.888..0.888 rows=1394 loops=1)
Index Cond: (category ~~* '%blu%'::text)
-> Bitmap Index Scan on item_brand_idx (cost=0.00..12.72 rows=96 width=0) (actual time=0.532..0.532 rows=42 loops=1)
Index Cond: (brand ~~* '%blu%'::text)
-> Hash (cost=1.97..1.97 rows=97 width=4) (actual time=0.059..0.060 rows=87 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on inventory_membership im (cost=0.00..1.97 rows=97 width=4) (actual time=0.010..0.032 rows=87 loops=1)
Planning Time: 0.924 ms
Execution Time: 42.093 ms
We can see the item_name_idx
, item_unique_attr_idx
, item_category_idx
and item_brand_idx
GIN indexes are being used to index the conditions.我们可以看到
item_name_idx
、 item_unique_attr_idx
、 item_category_idx
和item_brand_idx
GIN 索引正在用于索引条件。 Great.伟大的。
However, when I join another table ( inventory
table which only has id
and name
columns), the indexes disappear.但是,当我加入另一个表(只有
id
和name
列的inventory
表)时,索引会消失。 Explain:解释:
explain analyze select i.* from item i
join inventory inv on inv.id = i.inventory_id
join inventory_membership im on im.inventory_id = i.inventory_id
where i.name ilike '%blu%' or unique_attr ilike '%blu%' or category ilike '%blu%' or brand
ilike '%blu%';
Hash Join (cost=4.67..1172.61 rows=60407 width=478) (actual time=0.775..121.787 rows=57584 loops=1)
Hash Cond: (inv.id = im.inventory_id)
-> Merge Join (cost=1.49..440.81 rows=4982 width=482) (actual time=0.111..101.857 rows=3340 loops=1)
Merge Cond: (i.inventory_id = inv.id)
-> Index Scan using item_inventory_id_idx on item i (cost=0.29..13946.60 rows=4982 width=478) (actual time=0.085..99.857 rows=3340 loops=1)
Filter: ((name ~~* '%blu%'::text) OR (unique_attr ~~* '%blu%'::text) OR (category ~~* '%blu%'::text) OR (brand ~~* '%blu%'::text))
Rows Removed by Filter: 34858
-> Sort (cost=1.20..1.22 rows=8 width=4) (actual time=0.020..0.025 rows=8 loops=1)
Sort Key: inv.id
Sort Method: quicksort Memory: 25kB
-> Seq Scan on inventory inv (cost=0.00..1.08 rows=8 width=4) (actual time=0.006..0.009 rows=8 loops=1)
-> Hash (cost=1.97..1.97 rows=97 width=4) (actual time=0.650..0.651 rows=87 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 12kB
-> Seq Scan on inventory_membership im (cost=0.00..1.97 rows=97 width=4) (actual time=0.005..0.028 rows=87 loops=1)
Planning Time: 7.193 ms
Execution Time: 132.427 ms
And you can see the GIN indexes are gone and the only index the explain is using is the item_inventory_id_idx
- which is the regular FK BTREE index.你可以看到 GIN 索引消失了,解释使用的唯一索引是
item_inventory_id_idx
- 这是常规的 FK BTREE 索引。 Also, the execution time went through the roof.此外,执行时间也过得很快。 Why?
为什么?
You note that you are interested mostly in the inventory name, and that there are only 8 rows in the inventory table.您注意到您主要对库存名称感兴趣,并且库存表中只有 8 行。 The 8 rows is why the query planner prefers a
merge join
instead of the hash join
, which works better when both tables are large. 8 行是查询计划器更喜欢
merge join
而不是hash join
的原因,后者在两个表都很大时效果更好。 The merge join needed the inventory_id
in a sorted list (which is exactly what an index is), meaning that it preferred not to use your GIN indexes, since it thought that would be less efficient.合并连接需要排序列表中的
inventory_id
(这正是索引的含义),这意味着它宁愿不使用您的GIN 索引,因为它认为这样效率会降低。
Now, without the data, there are several things you can do, and I cannot tell which will be faster.现在,没有数据,你可以做几件事,我不知道哪一个会更快。 The first, which you already tried, is to fetch the inventory name in a
scalar subquery
:您已经尝试过的第一个是在
scalar subquery
中获取库存名称:
SELECT i.*, (select name from inventory where id = i.inventory_id) as inventoryName
FROM item i
JOIN inventory_membership im ON im.inventory_id = i.inventory_id
WHERE i.name ilike '%blu%' or unique_attr ilike '%blu%' or category ilike '%blu%'
or brand ilike '%blu%';
But that means this select
statement is executed 57k times, once for each row.但这意味着这个
select
语句被执行 57k 次,每行一次。 The second is to use the query you had, but see if changing i.inventory_id
to inv.id
in inventory_membership
changes anything.第二种是使用您拥有的查询,但查看将
i.inventory_id
更改为inv.id
in inventory_membership
是否会改变任何内容。
SELECT i.*, inv.name as inventoryName
FROM item i
JOIN inventory inv ON inv.id = i.inventory_id
JOIN inventory_membership im ON im.inventory_id = inv.id -- <- this changed
WHERE i.name ilike '%blu%' or unique_attr ilike '%blu%' or category ilike '%blu%'
or brand ilike '%blu%';
Finally, as it said in this question, you might force the first query to be executed, before getting the inventory name, using a CTE or subquery with OFFSET 0
.最后,正如它在这个问题中所说,您可能会在获取库存名称之前强制执行第一个查询,使用 CTE 或带有
OFFSET 0
的子查询。
WITH my_items AS (
SELECT i.*
FROM item i
JOIN inventory_membership im ON im.inventory_id = i.inventory_id
WHERE i.name ilike '%blu%' or unique_attr ilike '%blu%' or category ilike '%blu%'
or brand ilike '%blu%'
)
SELECT i.*, inv.name as inventoryName
FROM my_items i
JOIN inventory inv ON inv.id = i.inventory_id
or或者
SELECT i.*, inv.name as inventoryName
FROM (
SELECT i.*
FROM item i
JOIN inventory_membership im ON im.inventory_id = i.inventory_id
WHERE i.name ilike '%blu%' or unique_attr ilike '%blu%' or category ilike '%blu%'
or brand ilike '%blu%'
OFFSET 0 -- <- this forces the subquery to be evaluated separate from the rest of the query
) i
JOIN inventory inv ON inv.id = i.inventory_id
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.