简体   繁体   English

正确的Postgres全文搜索索引

[英]Correct Postgres full text search indexes

I'm creating a multi-column full text search index and currently I have this running 我正在创建一个多列全文搜索索引,目前正在运行

CREATE INDEX products_search_document ON products
USING gin(to_tsvector('english', style_number || ' ' || brand || ' ' || style_description || ' ' || color));

This works great for queries that I'm using like this 这对我正在使用的查询非常有用

SELECT * FROM "products"
WHERE (to_tsvector('english', style_number||' '||brand||' '||style_description||' '||color)
      @@ to_tsquery('english', 'G2000'))

I'd like to use prefix matching now though so that my query would look like this: 我现在想使用前缀匹配,以便查询如下所示:

SELECT * FROM "products"
WHERE (to_tsvector('english', style_number||' '||brand||' '||style_description||' '||color)
      @@ to_tsquery('english', 'G2000:*'))

and when I do run this on my Heroku postgres instance, I'm getting a Seq Scan on products instead of an Indexed scan. 当我在Heroku postgres实例上运行它时,我得到的是Seq Scan on productsSeq Scan on products而不是索引扫描。

What other index would I need to use the prefix matcher in Postgres? 在Postgres中使用前缀匹配器还需要什么其他索引?

奇怪的是,我删除了索引并重新创建了索引……这解决了该问题。

Have you tried doing: 您是否尝试过:

set enable_seqscan=off; 

and then running your query to see if it uses it. 然后运行查询以查看它是否使用它。 I don't see why it wouldn't. 我不明白为什么不会。 My suspicion is the planner thinks there is not enough specificity for that particular search so thinks a sequential scan is more efficient than a fulltext scan. 我怀疑规划者认为该特定搜索没有足够的特异性,因此认为顺序扫描比全文扫描更有效。

That said, I think for prefix queries (where you don't won't stem equivalency to kick in eg postgraduate and postgres being considered equivalent) a btree text_pattern_ops, gist(gist_gtrgm_ops) or a gin index (I think spgist might be good but haven't done any metrics on that) on just the concatenated values or even (just on style_number ) if that is all you will be prefixing, would be more efficient than full text. 就是说,我认为对于前缀查询(在这种情况下,您不会阻止等价于研究生和Postgres的查询),btree text_pattern_ops,gist(gist_gtrgm_ops)或gin索引(我认为spgist可能不错,但甚至没有对连接的值做任何度量,或者甚至(仅对style_number进行),如果这只是您要加前缀的值,它将比全文效率更高。 Your query would not use tsvector, would just use 您的查询将不会使用tsvector,而只会使用

style_number LIKE 'G5000%' style_number喜欢'G5000%'

style_number ILIKE 'G5000%' style_number ILIKE'G5000%'

and your index would be just on style_number or concatenated values 并且您的索引将仅位于style_number或串联值上

If you need case insensitivity then use gist(gist_trgm_ops) like covered here: http://www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html 如果您需要不区分大小写,请使用此处所述的gist(gist_trgm_ops): http : //www.postgresonline.com/journal/archives/212-PostgreSQL-9.1-Trigrams-teaching-LIKE-and-ILIKE-new-tricks.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM