简体   繁体   中英

PostgreSQL: how to disable for to_tsvector reducing the tokens to lexemes

I have such a query where I use the to_tsvector function. The documentation says:

to_tsvector parses a textual document into tokens, reduces the tokens to lexemes

But sometimes the function to_tsvector changes some names incorrectly. Is it possible for some words to cancel the cast to tokens? For example, for the value "Илья" to disable it, and for all other cases make it enable?

SELECT c.id,
       tsvector_agg(to_tsvector('russian',
                                coalesce(cv.data ->> 'name', '') || ' ' ||
                                coalesce(cv.data ->> 'surname', '') || ' ' ||
           )) as v
FROM client c

The proper way to disable stemming for certain words is to include a synonym dictionary into the text search configuration. You'd have to add the names to the synonym file, then any word that is handled by the synonym dictionary and not processed by the stemming dictionary that comes after it.

The linked documentation gives an example for the name "Paris", but it will work just as well in your case.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM