简体   繁体   English

Lucene样本查询

[英]Lucene Sample Query

When I search by phrase "ph1 ph2" it finds texts that contains "ph1" or "ph2". 当我通过短语“ ph1 ph2”搜索时,它会找到包含“ ph1”或“ ph2”的文本。

String line = "ph1 ph2";           
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field, analyzer);
Query query = parser.parse(line);  

Anybody knows how to search by 1) phrase ("ph1 ph2"). 任何人都知道如何通过1)短语(“ ph1 ph2”)进行搜索。 Example: This is sentence ph1 ph2. 示例:这是句子ph1 ph2。 2) phrase with maximum distance("ph1 ph2 ~3"). 2)具有最大距离的词组(“ ph1 ph2〜3”)。 Example This ph1 is sentence ph2. 示例此ph1是句子ph2。

PS I used standard Lucene Indexer to index my files. PS我使用标准的Lucene Indexer来索引我的文件。 If this example is not clear view http://www.lucenetutorial.com/lucene-query-syntax.html 如果此示例不清楚,请查看http://www.lucenetutorial.com/lucene-query-syntax.html

Here's full code: 这是完整的代码:

String index = "C:/programs/lucenedemo/index";
    String field = "contents";                    
    IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(index)));
    IndexSearcher searcher = new IndexSearcher(reader);
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
    //QueryParser parser = new QueryParser(Version.LUCENE_40, field, analyzer);          
    String line = "ph1 ph2";           
    QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field, analyzer);
    Query query = parser.parse(line);                     
    //doPagingSearch(searcher, query, hitsPerPage, raw, queries == null && queryString == null);         
    //doPagingSearch

    TopDocs results = searcher.search(query, 300000);
    ScoreDoc[] hits = results.scoreDocs;        
    System.out.println(results.totalHits);

    for (int i=0;i<10;i++) {    
    Document doc = searcher.doc(hits[i].doc);
        String path = doc.get("path");
        if (path != null) System.out.println((i+1) + ". " + path);                          
    } 

    //end of doPagingSearch
    reader.close();

You may want to use a SpanQuery. 您可能要使用SpanQuery。

Specifically, you can create a SpanNearQuey, passing the constructor an array of SpanTermQuerys, one for each clause in the phrase, and an int representing the "slop", or maximum distance (as well as a boolean indicating whether the terms must be in order). 具体来说,您可以创建一个SpanNearQuey,向构造函数传递一个SpanTermQuerys数组,一个用于短语中的每个子句,一个int表示“ slope”或最大距离(以及一个布尔值,指示术语是否必须按顺序排列) )。

To search, use the getSpans method on the query that you have created. 要进行搜索,请对您创建的查询使用getSpans方法。

Note that this will give you a list of all such occurrences, and not a list of matching documents. 请注意,这将为您提供所有此类事件的列表,而不是匹配文档的列表。 Depending on how you would like to present the results, you may need to iterate over the spans and group them according to document, etc. 根据您想要呈现结果的方式,您可能需要遍历跨度并根据文档等对它们进行分组。

I'm not clear on exactly what you are looking for, but I believe it's one of: 我不清楚您要寻找的是什么,但我相信它是以下之一:

  • "field:\\"" + line + "\\"" : Simple phrase query. "field:\\"" + line + "\\"" :简单短语查询。 Find the two adjacent ordered terms 查找两个相邻的有序术语

  • "field:\\"" + line + "\\"~3" : Phrase query with slop. "field:\\"" + line + "\\"~3" 〜3 "field:\\"" + line + "\\"~3" :带斜率的词组查询。 In order, but with up to three terms worth of separation in the two terms. 按顺序排列,但在两个术语中最多具有三个值得分离的术语。

  • "field:(" + line + ")" : Not a phrase query at all. "field:(" + line + ")" :根本不是短语查询。 Simple search for the two terms. 简单搜索两个术语。 Any order or distance is acceptable. 任何顺序或距离都是可以接受的。

You can see further options on query parser syntax in Lucene's query syntax documentation 您可以在Lucene的查询语法文档中查看有关查询解析器语法的更多选项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM