用Lucene搜索问题

Question

I have a Lucene index of around 22,000 lucene documents but I have been facing a unique problem with it while creating a search program. 我的Lucene索引大约有22,000个lucene文档，但是在创建搜索程序时一直遇到一个独特的问题。

Each document has a Title, description and long_description fields, these fields have data related to different diseases and their symptoms. 每个文档都有标题，描述和long_description字段，这些字段具有与不同疾病及其症状相关的数据。 Now when I search for a phrase like following "infection of the small intestine" I am expecting "Cholera" to be the first result(By the way I am using MultiFieldQueryParser with StandardAnalyzer.) 现在，当我搜索类似“小肠感染”这样的短语时，我期望“霍乱”成为第一个结果（顺便说一下，我将MultiFieldQueryParser与StandardAnalyzer一起使用。）

The reason I expect Cholera to be the first one is because it has exact phrase "infection of the small intestine" in the long description fields. 我希望霍乱成为第一个的原因是因为它在长描述字段中具有准确的短语“小肠感染”。 But instead of this result coming on top it comes way at the bottom because there are plenty of other documents which mentions the term "infection" in the title field(which is substantially smaller in length than description field). 但是，不是将结果排在最前面，而是在底部，因为还有许多其他文档在标题字段中提到了“感染”一词（其长度比描述字段小得多）。 This can be easily seen in the screenshot bellow. 在下面的屏幕快照中可以很容易地看到这一点。 在此处输入图片说明

So just because "cholera" does not have the most pertinent information in the "title" field it comes way at the bottom. 因此，仅由于“霍乱”在“标题”字段中没有最相关的信息，因此它排在最后。 I saw following thread where the use of "~3" is suggested, but is that what I should do for all my queries from behind the scene? 我看到以下建议使用“〜3”的线程，但这是我应该从后台对所有查询执行的操作吗？ Isn't there a better way of doing it? 没有更好的方法吗？

Searching phrases in Lucene 在Lucene中搜索短语

Answer 1

Make your query boost the hits in title high, description medium and long_desc low, like this: 使您的查询提高标题高，描述中和long_desc低的命中率，如下所示：

title:intestine^100 description:intestine^10 long_description:intestine^1

This example gives title matches score "+100", description matches score "+10" and long_description matches score "+1". 此示例给出标题匹配得分“ +100”，描述匹配得分“ +10”和long_description匹配得分“ +1”。 Higher total boost scores are sorted first. 较高的总提升分数将首先进行排序。 You can pick any numbers you like for the boost values. 您可以为升压值选择任何数字。

Answer 2

You can change computeNorm in DefaultSimilarity . 您可以在DefaultSimilarity更改computeNorm 。 Please check http://www.supermind.org/blog/378/lucene-scoring-for-dummies and http://blog.architexa.com/2010/12/custom-lucene-scoring/ 请检查http://www.supermind.org/blog/378/lucene-scoring-for-dummies和http://blog.architexa.com/2010/12/custom-lucene-scoring/

用Lucene搜索问题

问题描述

2 个解决方案

解决方案1
1 2011-08-09 07:31:06

解决方案2
0 已采纳 2011-08-09 06:40:10

用Lucene搜索问题

问题描述

2 个解决方案

解决方案1 1 2011-08-09 07:31:06

解决方案2 0 已采纳 2011-08-09 06:40:10

解决方案1
1 2011-08-09 07:31:06

解决方案2
0 已采纳 2011-08-09 06:40:10