使用PostgreSQL在多列中进行全文搜索

Question

我刚刚开始使用postgreSQL进行模糊文本匹配。 我有两列： job_title和company_name 。

典型的全文搜索将串联job_title和company_name ，然后根据单个排名返回搜索文本结果。

但是，在我的情况下，在两列中相等地对待文本匹配可能会有问题。 例如，不应将Google Co. Search Engineer与Engineer Co. Google Search排名相同Engineer Co.

我知道我可以为每列分配不同的权重。 但是，我没有理由将一个比另一个更重要。

如何分别对每个列匹配关键字，并在每个关键字上返回一些“匹配分数”？

就像是：

Jobs.where("(to_tsvector('english', position) @@ plainto_tsquery(:q)) AND 

(to_tsvector('english',company) @@ plainto_tsquery(:q))", q: "Search Engineer", q: "Google")

Answer 1

如您所指出的，您可以串联tsvector：

# select to_tsvector('job description') ||
         to_tsvector('company as keyword') ||
         to_tsvector('job description as body') as vector;
                          vector                           
-----------------------------------------------------------
 'bodi':9 'compani':3 'descript':2,7 'job':1,6 'keyword':5
(1 row)

您还可以为其分配权重：

# select (setweight(to_tsvector('job description'), 'A') ||
         setweight(to_tsvector('company as keyword'), 'B') ||
         setweight(to_tsvector('job description as body'), 'D')) as vector;
                            vector                             
---------------------------------------------------------------
 'bodi':9 'compani':3B 'descript':2A,7 'job':1A,6 'keyword':5B
(1 row)

您还可以使用ts_rank_cd() 。 特别是，您可以更改分数标准化的方式。

http://www.postgresql.org/docs/current/static/textsearch-controls.html

在您的情况下，您似乎想组合两个单独的查询。 一个难看但可能适当的解决方案可能看起来像这样：

select sum(rank) as rank, ...
from (
   select ...
   union all
   select ...
   ) as sub
group by ...
order by sum(rank) desc
limit 10

如您所见，它不是很漂亮。 它也是聚合潜在的大量匹配行的大道。 恕我直言，您最好还是坚持使用内置的tsvector算法并根据需要调整权重。

使用PostgreSQL在多列中进行全文搜索

问题描述

1 个解决方案

解决方案1
1 已采纳 2013-06-28 11:12:41

使用PostgreSQL在多列中进行全文搜索

问题描述

1 个解决方案

解决方案1 1 已采纳 2013-06-28 11:12:41

解决方案1
1 已采纳 2013-06-28 11:12:41