简体   繁体   English

lucene标题/内容搜索

[英]lucene title/content searching

I'm storing my lucene docs like so: 我像这样存储我的Lucene文档:

Document doc = new Document();
doc.add(new TextField("contents", "Homer January, Lenny February"));
doc.add(new TextField("title", "2017 on call schedule.xls", Field.Store.YES));

Document doc = new Document();
doc.add(new TextField("contents", "Carl January, Frank February"));
doc.add(new TextField("title", "2018 on call schedule.xls", Field.Store.YES));

I can get a hit if I search for the exact title, or for like 如果搜索确切的标题或类似的字词,我会获得成功

2017

but no hits if i try things like 但如果我尝试类似的话就没有成功

call
on call
xls

I've tried simple things like 我已经尝试过简单的事情,例如

 Query query1 = new QueryParser("title", analyzer).parse("on call");

and more complicated ideas like 还有更复杂的想法,例如

Builder bb = new BooleanQuery.Builder();
for(String chunk : "on call".split(" ")){
    bb.add(new TermQuery(new Term("title", chunk)), BooleanClause.Occur.SHOULD);
}
BooleanQuery booleanQuery = bb.build();

maybe I'm storing my Docs wrong? 也许我的文档存储错误?

I'm using the StandardAnalyzer on search & insert. 我在搜索和插入上使用StandardAnalyzer

Seems like I'm missing something quite fundamental here.. Anyone have any tips please? 似乎我在这里缺少了一些非常基本的东西。

I think, its always a good idea to visualize your terms before running your search. 我认为,在运行搜索之前可视化您的术语始终是一个好主意。 Below is image from Luke tool. 下面是来自Luke工具的图像。

在此处输入图片说明

That simply indicates that there is no term with schedule but schedule.xls . 这仅表明没有带有schedule术语,而是schedule.xls

I am using Lucene 6.6.6 and had to modify your code to , 我使用的是Lucene 6.6.6,不得不将您的代码修改为,

Document doc = new Document();

        doc.add(new TextField("contents", "Homer January, Lenny February",Store.YES));
        doc.add(new TextField("title", "2017 on call schedule.xls", Store.YES));

        iwriter.addDocument(doc);

        doc = new Document();
        doc.add(new TextField("contents", "Carl January, Frank February",Store.YES));
        doc.add(new TextField("title", "2018 on call schedule.xls", Store.YES));

        iwriter.addDocument(doc);

        iwriter.commit(); 

Now for searching 现在进行搜索

Your query parser is basically producing a query - title:schedule that means an exact search ( without wild cards ) on field title and since there are no such terms , you find zero hits. 您的查询解析器基本上会生成一个查询title:schedule ,这意味着要对字段title进行精确搜索(不带通配符),并且由于没有这样的术语,因此您找到零命中。

Modifying your query to - Query query1 = new QueryParser("title", analyzer).parse("schedule*"); 将您的查询修改为- Query query1 = new QueryParser("title", analyzer).parse("schedule*"); will get you two hits. 会给你带来两次成功。

So as a best practice, before searching , always try to have a look & visualize your indexed data. 因此,作为最佳实践,在搜索之前,请始终尝试查找和可视化索引数据。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM