简体   繁体   English

如何使用 PostgreSQL 全文搜索返回部分短语匹配而不返回太多行?

[英]How do I return partial phrase matches with PostgreSQL full-text search without returning too many rows?

I am using the pg_search gem to perform full-text search in PostgreSQL, and it is working well.我正在使用pg_search gem 在 PostgreSQL 中执行全文搜索,并且运行良好。 However, some searches are not returning any video results when they should be.但是,某些搜索在应有的情况下没有返回任何视频结果。

For example, searching for "states of matter" returns 10 results since the videos have a tag named "states of matter."例如,搜索“物质状态”会返回 10 个结果,因为视频有一个名为“物质状态”的标签。 But "3 states of matter" returns zero results.但是“3 种物质状态”返回零结果。 Similarly, "radiocarbon dating" returns 1 video, but "carbon dating" returns nothing.同样,“放射性碳测年”返回 1 个视频,但“碳测年”不返回任何内容。

Here's how I've set up my model:以下是我设置 model 的方法:

# app/models/video.rb
class Video < ApplicationRecord
  include PgSearch::Model

  ...

  pg_search_scope(:user_search, {
    against: {
      title: 'C',
      description: 'D'
    },
    associated_against: {
      tags: { name: 'A' }
    },
    using: {
      tsearch: {
        prefix: true,
        dictionary: "english"
      }
    }
  })

  ...

end

To run a search:要运行搜索:

query = "3 states of matter"
results = Video.user_search(query)

If I use other options such as trigram or any_word , it returns too many irrelevant results.如果我使用 trigram 或any_word等其他选项,它会返回太多不相关的结果。

How can I improve my full-text search functionality with partial matches and avoid returning too many irrelevant results?如何通过部分匹配改进全文搜索功能并避免返回太多不相关的结果? I would much prefer a solution that uses pg_search , but if I need to move away from the gem, then I will.我更喜欢使用pg_search的解决方案,但是如果我需要远离 gem,那么我会的。

Hope you tried the trigram option with the default threshold(ie 0.3 ).希望您尝试使用默认阈值(即0.3 )的trigram选项。 You can try increasing the threshold to look for strict matches.您可以尝试增加阈值以查找严格匹配。 Checkout gem docs for more info - Trigram#threshold .查看 gem 文档以获取更多信息 - Trigram#threshold

By default, trigram searches find records which have a similarity of at least 0.3 using pg_trgm's calculations.默认情况下,trigram 搜索使用 pg_trgm 的计算查找相似度至少为 0.3 的记录。 You may specify a custom threshold if you prefer.如果您愿意,可以指定自定义阈值。 Higher numbers match more strictly, and thus return fewer results.数字越大匹配越严格,因此返回的结果越少。 Lower numbers match more permissively, letting in more results.较低的数字更容易匹配,从而获得更多结果。 Please note that setting a trigram threshold will force a table scan as the derived query uses the similarity() function instead of the % operator.请注意,设置三元组阈值将强制执行表扫描,因为派生查询使用相似度() function 而不是 % 运算符。

Check out the below and decide based on your table size,查看以下内容并根据您的桌子大小决定,

Please note that setting a trigram threshold will force a table scan as the derived query uses the similarity() function instead of the % operator.请注意,设置三元组阈值将强制执行表扫描,因为派生查询使用相似度() function 而不是 % 运算符。

To understand how trigram word similarity is being calculated.了解如何计算三元词相似度。 Please check this https://www.postgresql.org/docs/9.6/pgtrgm.html and you can set the threshold based on what level of comparison you need.请查看此https://www.postgresql.org/docs/9.6/pgtrgm.html ,您可以根据需要的比较级别设置阈值。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM