简体   繁体   English

PostgreSQL 全文搜索权重/搜索词的优先级

[英]PostgreSQL full text search weight/priority on searchterms

I am using Full Text Search in PostgreSQL through Django.我在 PostgreSQL 到 Django 中使用全文搜索。

I want to associate weights to searchterms.我想将权重与搜索词相关联。 I know it is possible to associate different weights to different fields, but i want to have different weight on searchterms.我知道可以将不同的权重与不同的字段相关联,但我希望对搜索词具有不同的权重。

Example:例子:

from core.models import SkillName
vector = SearchVector(
    "name",
)
search = SearchQuery("Java") | SearchQuery("Spring")
search_result = (
    SkillName.objects.all()
        .annotate(search=vector)
        .filter(search=search)
        .annotate(rank=SearchRank(vector, search))
        .order_by("-rank")
)
for s in search_result.distinct():
    print(f"{s} rank: {s.rank}")

And now i want "Java" to be more important than "Spring" and get ranking accordingly.现在我希望“Java”比“Spring”更重要并获得相应的排名。 I guess i could do 2 different searches and multiply the ranks with factors, but is there a better way?我想我可以进行 2 次不同的搜索并将排名乘以因子,但有更好的方法吗?

Is it really that weird to want to associate different priority to searchterms?想要将不同的优先级与搜索词相关联真的很奇怪吗?

Generated SQL for reference, i honestly dont think this is possible in Django right now anyway and we might need the help of a PostgreSQL-guru.生成的 SQL 供参考,老实说,我现在不认为这在 Django 中是可能的,我们可能需要 PostgreSQL 专家的帮助。

SELECT DISTINCT "core_skillname"."id",
                "core_skillname"."name",
                to_tsvector(COALESCE("core_skillname"."name", '')) AS "search",
                ts_rank(to_tsvector(COALESCE("core_skillname"."name", '')), (plainto_tsquery('Java') || plainto_tsquery('Spring'))) AS "rank"
FROM "core_skillname"
WHERE to_tsvector(COALESCE("core_skillname"."name", '')) @@ (plainto_tsquery('Java') || plainto_tsquery('Spring'))
ORDER BY "rank" DESC;```

Applying the ranks with weights doesn't require two queries, just two sub-expressions in the same query.应用带有权重的等级不需要两个查询,只需要在同一个查询中使用两个子表达式。

SELECT DISTINCT "core_skillname"."id",
                "core_skillname"."name",
                to_tsvector(COALESCE("core_skillname"."name", '')) AS "search",
                ts_rank(to_tsvector(COALESCE("core_skillname"."name", '')), plainto_tsquery('Spring')) +
                ts_rank(to_tsvector(COALESCE("core_skillname"."name", '')), plainto_tsquery('Java')) * 1.5 AS "rank"
FROM "core_skillname"
WHERE to_tsvector(COALESCE("core_skillname"."name", '')) @@ (plainto_tsquery('Java') || plainto_tsquery('Spring'))
ORDER BY "rank" DESC;

Since it is so easy to scratch your own itch this way, why invent some other mechanism to do it?既然这样很容易挠自己的痒,为什么要发明其他机制来做到这一点呢? When the weights are part of the table, not part of the query, you couldn't really do it this way, so its own mechanism makes more sense.当权重是表的一部分,而不是查询的一部分时,你不能真正这样做,所以它自己的机制更有意义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM