简体   繁体   English

在Postgresql下使用to_tsquery搜索奇怪的结果

[英]Strange result searching with to_tsquery under Postgresql

I got a strange result searching for an expression like pro-physik.de with tsquery. 我用tsquery搜索像pro-physik.de这样的表达式时得到一个奇怪的结果。

If I ask for pro-physik:* by tsquery I want to get all entries starting with pro-physik . 如果我要求tsquery提供pro-physik:* ,我想获取所有以pro-physik开头的条目。 Unfortunately those entries with pro-physik.de are missing. 不幸的是,缺少带有pro-physik.de条目。

Here are 2 examples to demonstrate the problem: 以下是两个示例来演示此问题:

Query 1: 查询1:

select 
    to_tsvector('simple', 'pro-physik.de') @@ 
    to_tsquery('simple', 'pro-physik:*') = true

Result 1: false (should be true ) 结果1: false (应该为true

Query 2: 查询2:

select 
    to_tsvector('simple', 'pro-physik.de') @@
    to_tsquery('simple', 'pro-p:*') = true

Result 2: true 结果2: true

Has anybody an idea how I could solve this problem? 有谁知道我该如何解决这个问题?

The core of the problem is that the parser will parse pro-physik.de as a hostname: 问题的核心是解析器会将pro-physik.de解析为主机名:

SELECT alias, token FROM ts_debug('simple', 'pro-physik.de');

 alias |     token
-------+---------------
 host  | pro-physik.de
(1 row)

Compare this: 比较一下:

SELECT alias, token FROM ts_debug('simple', 'pro-physik-de');
      alias      |     token
-----------------+---------------
 asciihword      | pro-physik-de
 hword_asciipart | pro
 blank           | -
 hword_asciipart | physik
 blank           | -
 hword_asciipart | de
(6 rows)

Now pro-physik and pro-p are not hostnames, so you get 现在pro-physikpro-p 不是主机名,所以您得到

SELECT to_tsquery('simple', 'pro-physik:*');
              to_tsquery
---------------------------------------
 'pro-physik':* & 'pro':* & 'physik':*
(1 row)

SELECT to_tsquery('simple', 'pro-p:*');
         to_tsquery
-----------------------------
 'pro-p':* & 'pro':* & 'p':*
(1 row)

The first tsquery will not match because physik is not a prefix of pro-physik.de , and the second will match because pro-p , pre and p all three are prefixes. 第一个tsquery不匹配,因为physik不是一个前缀pro-physik.de ,第二个会匹配,因为pro-p prep三者均为前缀。

As a workaround, use full text search like this: 解决方法是使用全文搜索,如下所示:

select 
   to_tsvector('simple', replace('pro-physik.de', '.', ' ')) @@ 
   to_tsquery('simple', replace('pro-physik:*', '.', ' '))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM