简体   繁体   English

模糊搜索与全文搜索有什么区别?

[英]What exactly differs fuzzy search from Full Text Search?

In my project, I am asked to implement a text query service on the database we are using;在我的项目中,我被要求在我们使用的数据库上实现一个文本查询服务; Postgresql. PostgreSQL。 I have used Postgresql Full Text Search features, which works fairly fine in terms of time.我使用了Postgresql 全文搜索功能,它在时间方面工作得相当好。 One problem about full text search is, it does not have fuzzy search abilities.全文搜索的一个问题是,它没有模糊搜索能力。 On the other hand, there is an extension named pgtrgm providing functions and operators for determining the similarity of alphanumeric text .另一方面,有一个名为pgtrgm的扩展,提供用于确定字母数字文本相似性的函数和运算符 Also there are several examples of text search using pgtrgm like:还有一些使用 pgtrgm 进行文本搜索的 示例,例如:

select actor
    from products
    where actor % 'tomy';

As you know example of postgres FTS also here;如您所知,这里也有 postgres FTS 的例子;

SELECT title
FROM pgweb
WHERE to_tsvector(body) @@ to_tsquery('friend');

So, the main question is, what is the difference between these two search strategies?那么,主要问题是,这两种搜索策略有什么区别? Which one is more appropriate way for searching texts?哪一种更适合搜索文本? Is it possible to mix them?可以混合使用吗? I also need to say that performance is an important concern as well.我还需要说,性能也是一个重要的问题。 Thanks in advance!提前致谢!

They do completely different things.他们做完全不同的事情。 About the only thing that is not different between them is that they operate on text and can benefit from use of indexes.它们之间唯一没有区别的是它们对文本进行操作并且可以从索引的使用中受益。 From you question, it seems like you already have a good sense of the differences.从您的问题来看,您似乎已经对差异有了很好的了解。 The appropriate one is the one that does what you want.合适的一种是做你想做的。 If one of them was always appropriate, we probably wouldn't have created the other one.如果其中一个总是合适的,我们可能不会创建另一个。

You can mix them, but you will need different indexes for each one, they cannot share an index.您可以混合使用它们,但是每个索引都需要不同的索引,它们不能共享一个索引。 Also, you probably need different tables as well, as full text search is more appropriate for sentences or paragraphs while trigram for individual words or short phrases.此外,您可能还需要不同的表格,因为全文搜索更适合句子或段落,而三元组更适合单个单词或短语。

One way to mix them would be to have one table of full texts, and another table which lists only each distinct word present in any of the full texts.混合它们的一种方法是有一个全文表,另一个表只列出任何全文中存在的每个不同的单词。 The 2nd table could be used to detect probable typos in the query, and then once those are fixed by suggestions from trigram searching, run the fixed query against the 1st table.第二个表可用于检测查询中可能的拼写错误,然后一旦根据三元组搜索的建议修复了这些错误,就对第一个表运行固定查询。

The difference is quite huge - in fuzzy search, you're searching for a similar result, in full-text search - for the exact same.差异非常大 - 在模糊搜索中,您正在搜索类似的结果,在全文搜索中 - 完全相同。 If one is more appropriate than the other is the matter of use-case.如果一个比另一个更合适,那就是用例问题。

If you don't need fuzziness, don't use it, it's a huge performance overhead because it has to match the text not exactly, but also try other combinations.如果您不需要模糊性,请不要使用它,这是一个巨大的性能开销,因为它必须不完全匹配文本,还要尝试其他组合。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM