使用PostgreSQL在多列中進行全文搜索

Question

我剛剛開始使用postgreSQL進行模糊文本匹配。 我有兩列： job_title和company_name 。

典型的全文搜索將串聯job_title和company_name ，然后根據單個排名返回搜索文本結果。

但是，在我的情況下，在兩列中相等地對待文本匹配可能會有問題。 例如，不應將Google Co. Search Engineer與Engineer Co. Google Search排名相同Engineer Co.

我知道我可以為每列分配不同的權重。 但是，我沒有理由將一個比另一個更重要。

如何分別對每個列匹配關鍵字，並在每個關鍵字上返回一些“匹配分數”？

就像是：

Jobs.where("(to_tsvector('english', position) @@ plainto_tsquery(:q)) AND 

(to_tsvector('english',company) @@ plainto_tsquery(:q))", q: "Search Engineer", q: "Google")

Answer 1

如您所指出的，您可以串聯tsvector：

# select to_tsvector('job description') ||
         to_tsvector('company as keyword') ||
         to_tsvector('job description as body') as vector;
                          vector                           
-----------------------------------------------------------
 'bodi':9 'compani':3 'descript':2,7 'job':1,6 'keyword':5
(1 row)

您還可以為其分配權重：

# select (setweight(to_tsvector('job description'), 'A') ||
         setweight(to_tsvector('company as keyword'), 'B') ||
         setweight(to_tsvector('job description as body'), 'D')) as vector;
                            vector                             
---------------------------------------------------------------
 'bodi':9 'compani':3B 'descript':2A,7 'job':1A,6 'keyword':5B
(1 row)

您還可以使用ts_rank_cd() 。 特別是，您可以更改分數標准化的方式。

http://www.postgresql.org/docs/current/static/textsearch-controls.html

在您的情況下，您似乎想組合兩個單獨的查詢。 一個難看但可能適當的解決方案可能看起來像這樣：

select sum(rank) as rank, ...
from (
   select ...
   union all
   select ...
   ) as sub
group by ...
order by sum(rank) desc
limit 10

如您所見，它不是很漂亮。 它也是聚合潛在的大量匹配行的大道。 恕我直言，您最好還是堅持使用內置的tsvector算法並根據需要調整權重。

使用PostgreSQL在多列中進行全文搜索

問題描述

1 個解決方案

解決方案1
1 已采納 2013-06-28 11:12:41

使用PostgreSQL在多列中進行全文搜索

問題描述

1 個解決方案

解決方案1 1 已采納 2013-06-28 11:12:41

解決方案1
1 已采納 2013-06-28 11:12:41