I got a strange result searching for an expression like pro-physik.de
with tsquery.
If I ask for pro-physik:*
by tsquery I want to get all entries starting with pro-physik
. Unfortunately those entries with pro-physik.de
are missing.
Here are 2 examples to demonstrate the problem:
Query 1:
select
to_tsvector('simple', 'pro-physik.de') @@
to_tsquery('simple', 'pro-physik:*') = true
Result 1: false
(should be true
)
Query 2:
select
to_tsvector('simple', 'pro-physik.de') @@
to_tsquery('simple', 'pro-p:*') = true
Result 2: true
Has anybody an idea how I could solve this problem?
The core of the problem is that the parser will parse pro-physik.de
as a hostname:
SELECT alias, token FROM ts_debug('simple', 'pro-physik.de');
alias | token
-------+---------------
host | pro-physik.de
(1 row)
Compare this:
SELECT alias, token FROM ts_debug('simple', 'pro-physik-de');
alias | token
-----------------+---------------
asciihword | pro-physik-de
hword_asciipart | pro
blank | -
hword_asciipart | physik
blank | -
hword_asciipart | de
(6 rows)
Now pro-physik
and pro-p
are not hostnames, so you get
SELECT to_tsquery('simple', 'pro-physik:*');
to_tsquery
---------------------------------------
'pro-physik':* & 'pro':* & 'physik':*
(1 row)
SELECT to_tsquery('simple', 'pro-p:*');
to_tsquery
-----------------------------
'pro-p':* & 'pro':* & 'p':*
(1 row)
The first tsquery
will not match because physik
is not a prefix of pro-physik.de
, and the second will match because pro-p
, pre
and p
all three are prefixes.
As a workaround, use full text search like this:
select
to_tsvector('simple', replace('pro-physik.de', '.', ' ')) @@
to_tsquery('simple', replace('pro-physik:*', '.', ' '))
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.