简体   繁体   中英

Lucene Sample Query

When I search by phrase "ph1 ph2" it finds texts that contains "ph1" or "ph2".

String line = "ph1 ph2";           
QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field, analyzer);
Query query = parser.parse(line);  

Anybody knows how to search by 1) phrase ("ph1 ph2"). Example: This is sentence ph1 ph2. 2) phrase with maximum distance("ph1 ph2 ~3"). Example This ph1 is sentence ph2.

PS I used standard Lucene Indexer to index my files. If this example is not clear view http://www.lucenetutorial.com/lucene-query-syntax.html

Here's full code:

String index = "C:/programs/lucenedemo/index";
    String field = "contents";                    
    IndexReader reader = DirectoryReader.open(FSDirectory.open(new File(index)));
    IndexSearcher searcher = new IndexSearcher(reader);
    Analyzer analyzer = new StandardAnalyzer(Version.LUCENE_40);
    //QueryParser parser = new QueryParser(Version.LUCENE_40, field, analyzer);          
    String line = "ph1 ph2";           
    QueryParser parser = new QueryParser(Version.LUCENE_CURRENT, field, analyzer);
    Query query = parser.parse(line);                     
    //doPagingSearch(searcher, query, hitsPerPage, raw, queries == null && queryString == null);         

    TopDocs results = searcher.search(query, 300000);
    ScoreDoc[] hits = results.scoreDocs;        

    for (int i=0;i<10;i++) {    
    Document doc = searcher.doc(hits[i].doc);
        String path = doc.get("path");
        if (path != null) System.out.println((i+1) + ". " + path);                          

    //end of doPagingSearch

You may want to use a SpanQuery.

Specifically, you can create a SpanNearQuey, passing the constructor an array of SpanTermQuerys, one for each clause in the phrase, and an int representing the "slop", or maximum distance (as well as a boolean indicating whether the terms must be in order).

To search, use the getSpans method on the query that you have created.

Note that this will give you a list of all such occurrences, and not a list of matching documents. Depending on how you would like to present the results, you may need to iterate over the spans and group them according to document, etc.

I'm not clear on exactly what you are looking for, but I believe it's one of:

  • "field:\\"" + line + "\\"" : Simple phrase query. Find the two adjacent ordered terms

  • "field:\\"" + line + "\\"~3" : Phrase query with slop. In order, but with up to three terms worth of separation in the two terms.

  • "field:(" + line + ")" : Not a phrase query at all. Simple search for the two terms. Any order or distance is acceptable.

You can see further options on query parser syntax in Lucene's query syntax documentation

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM