简体   繁体   English

Apache Lucene Boost文档部分

[英]Apache Lucene boost document section

I am working on a project in Apache Lucene 7.2.1 and I want to change the scoring system for documents, so that the first part of a document (first 5 words) are twice more relevant than the rest of the document. 我正在Apache Lucene 7.2.1上的一个项目上工作,我想更改文档的评分系统,以便文档的第一部分(前5个单词)的相关性是文档其余部分的两倍。

As an example: 举个例子:

doc1 = "one two three four five six" doc1 =“一二三四五六”

doc2 = "six one two three four five" doc2 =“六一二三四有五”

query = "six" 查询=“六个”

The score for doc2 must be twice larger than the score for doc1. doc2的分数必须比doc1的分数大两倍。

Can you please help me achieve this? 你能帮我实现这个吗? I know that in older versions of Lucene there was a setBoost method on Field, but in this version, there isn't one. 我知道在Lucene的旧版本中,Field上有一个setBoost方法,但是在此版本中,没有。 Should the boost be set when a document is indexed, or when the query is made? 是否应在为文档建立索引或进行查询时设置增强功能?

Thank you! 谢谢!

Boosting should be done while search-time. 增强应该在搜索时完成。 You're able to achieve this with a BoostQuery. 您可以使用BoostQuery实现此目的。

BoostQuery is Query class so you're able to combine this with other query types. BoostQuery是Query类,因此您可以将其与其他查询类型结合使用。 An abstract example: 一个抽象的例子:

BooleanQuery booleanQuery = new BooleanQuery();
booleanQuery.add(new BoostQuery(query1, 2f), BooleanClause.Occur.MUST);
booleanQuery.add(new BoostQuery(query2, 1f), BooleanClause.Occur.MUST);

See more details for general scoring and boosting here: https://lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/package-summary.html#package.description 在此处查看有关常规评分和提升的更多详细信息: https : //lucene.apache.org/core/7_0_0/core/org/apache/lucene/search/package-summary.html#package.description

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM