简体   繁体   English

用于使用 Rails 进行 Postgres 全文搜索的单词分隔符

[英]Word separators for Postgres full text search with Rails

I'm using pg_search for some text searching within my model.我正在使用pg_search在我的模型中进行一些文本搜索。 Among other attributes, I have an url field.在其他属性中,我有一个url字段。

Unfortuantelly Postgres doesn't seem to identify / and .不幸的是,Postgres 似乎无法识别/. as word separators, therefore I cannot search within the url .作为单词分隔符,因此我无法在url搜索。

Example: searching for test in http://test.com yields no results.示例:在http://test.com 中搜索test没有结果。

Is there a way to fix this problem, perhaps using another gem or some inline SQL ?有没有办法解决这个问题,也许使用另一个 gem 或一些内联 SQL ?

As stated in the documentation (and noticed by AJcodez), there is a solution in creating a dedicated column for tsvector index. 如文档中所述(并由AJcodez注意到),有一种为tsvector索引创建专用列的解决方案。 Then define a trigger that catches insertions to index urls properly: 然后定义一个触发器,以正确捕获插入到索引URL的插入:

CREATE test_url (url varchar NOT NULL, url_tsvector tsvector NOT NULL);

This method will transorm any non alpha characters into single space and turn the string into a tsvector: 此方法会将所有非字母字符转换为单个空格,并将字符串转换为tsvector:

CREATE OR REPLACE FUNCTION generate_url_tsvector(varchar) 
RETURNS tsvector 
LANGUAGE sql 
AS $_$
    SELECT to_tsvector(regexp_replace($1, '[^\w]+', ' ', 'gi'));
$_$;

Now create a trigger that calls this function: 现在创建一个调用此函数的触发器:

CREATE OR REPLACE FUNCTION before_insert_test_url()
RETURNS TRIGGER
LANGUAGE plpgsql AS $_$
BEGIN;
  NEW.url_tsvector := generate_url_tsvector(NEW.url); 

  RETURN NEW;
END;
$_$
;

CREATE TRIGGER before_insert_test_url_trig 
BEFORE INSERT ON test_url 
FOR EACH ROW EXECUTE PROCEDURE before_insert_test_url();

Now, when url are inserted, the `url_tsvectorè field will be automatically populated. 现在,当插入url时,将会自动填充url_tsvectorè字段。

INSERT INTO test_url (url) VALUES ('http://www.google.fr');
TABLE test_url;

 id          url                     url_tsvector            

  2  http://www.google.fr  'fr':4 'googl':3 'http':1 'www':2 

(1 row)

To FT search on URLs you only need to query against this field. 要对URL进行FT搜索,您只需查询此字段即可。

SELECT * FROM test_url WHERE url_tsvector @@ 'google'::tsquery;

I ended up modifying the pg_search gem to support arbitrary ts_vector expressions instead of just column names. 我最终修改了pg_search gem,以支持任意ts_vector表达式,而不仅仅是列名。 The changes are here 更改在这里

Now I can write: 现在我可以写:

pg_search_scope :search, 
    against: [[:title , 'B'], ["to_tsvector(regexp_replace(url, '[^\\w]+', ' ', 'gi'))", 'A']],
    using: {tsearch: {dictionary: "simple"}}

Slightly simpler approach, add the protocol token type to the simple dictionary:稍微简单的方法,将协议令牌类型添加到simple字典中:

ALTER TEXT SEARCH CONFIGURATION simple
    ADD MAPPING FOR protocol
        WITH simple;

you can also add it to the english dictionary if you need stemming如果您需要词干,您也可以将其添加到english词典中

https://www.postgresql.org/docs/13/textsearch-parsers.html https://www.postgresql.org/docs/13/textsearch-parsers.html

https://www.postgresql.org/docs/13/sql-altertsconfig.html https://www.postgresql.org/docs/13/sql-altertsconfig.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM