简体   繁体   English

Gin索引以及PostgreSQL中gin_trgm_ops和to_tsvector之间的差异

[英]Gin index and difference between gin_trgm_ops and to_tsvector in PostgreSQL

I am trying to understand how to use gin index for full text search in PostgreSQL. 我试图了解如何在PostgreSQL中使用gin索引进行全文搜索。 And I see that there are two ways to do it. 而且我看到有两种方法可以做到这一点。 For example, if we have a table: 例如,如果我们有一个表:

CREATE TABLE IF NOT EXISTS users (
    id SERIAL NOT NULL,
    name VARCHAR(512) NOT NULL,
    PRIMARY KEY (id));

Then we can create index using such variants: 然后,我们可以使用以下变体创建索引:

CREATE INDEX users_name_idx ON users USING gin (name gin_trgm_ops);

or 要么

CREATE INDEX users_name_idx ON users USING gin (to_tsvector('language', name));

As I understand (I can be wrong) the first variant divides text to three letter tokens and doesn't depend on the language. 据我了解(我可能是错的),第一个变体将文本分为三个字母标记,并且不依赖于语言。 The second variant uses stemming to return list of word roots and that's why depends on language. 第二种变体使用词干来返回词根列表,这就是为什么要依赖语言的原因。

My question - is my understanding correct and in what cases I should use the first variant and in what cases the second. 我的问题是-我的理解是正确的,在什么情况下应使用第一种变体,在什么情况下应使用第二种变体。

What you say is correct. 你说的是对的。

In addition to that, and maybe the most important difference, full text search can only search for whole words, while a trigram index can be used to search for arbitrary substrings and also find results that are only similar to the search condition (using the distance operator). 除此之外,也许是最重要的区别,全文搜索只能搜索整个单词,而三字母组合索引可以用于搜索任意子字符串,还可以找到仅与搜索条件相似的结果(使用距离)运营商)。

Unsurprisingly, trigram indexes don't perform well for short search strings. 毫不奇怪,对于较短的搜索字符串,trigram索引的效果不佳。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM