简体   繁体   English

用户搜索输入的最佳Lucene查询

[英]best lucene query for user search input


I have news storage with 1,000,000 records Im using lucene library for fulltext searching my news hast (title, body, news date, ...) i need to find the best query for finding most relevant results from user input what strategy or algorithm i should use for achieve this 我有1,000,000条记录的新闻存储库。我使用lucene库进行全文搜索,搜索我的新闻报道(标题,正文,新闻日期等)。我需要找到最佳查询,以便从用户输入中找到最相关的结果,我应该采用哪种策略或算法。用于实现这一目标

now im using something like this (title^3.0 body^2.0) but i think its to simple i'm searching for more sophisticated algorithm to get more relevant results. 现在我使用这样的东西(标题^ 3.0身体^ 2.0),但我认为这很简单,我正在寻找更复杂的算法以获取更多相关结果。

I really really appreciate if you help me finding that my overflow friends ! 如果您能帮助我找到我的朋友,我将不胜感激!

Improving search relevance takes time and iterative refinement. 提高搜索的相关性需要时间和迭代的完善。

The LucidImagination team have a good write up (very solr based though): http://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Search-Application-Relevance-Issues LucidImagination团队的写作很好(尽管非常基于solr): http ://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Search-Application-Relevance-Issues

You may want to analyse logs and add pageviews per doc into your index so you can factor that in your sort order. 您可能需要分析日志并将每个文档的综合浏览量添加到索引中,以便将其按排序顺序考虑在内。

The figures don't have to be really accurate as long as magnitude variations are captured. 只要捕获到幅度变化,这些数字就不必真正准确。

You should also analyse logs for mis-spellings. 您还应该分析日志中的拼写错误。 The lucidimagination guys had some a podcast or blog about indexing them as well. 清醒的家伙们也有一些关于将它们编入索引的播客或博客。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM