简体   繁体   中英

Sort by distance of the words lucene

I'm using Lucene 2.9.4 in my website. In the website there is simple a input text for the user to input text and search.

Example:

When the input is Gói thầu số 15 , the query.toString() call returns: (BID_NM:gói BID_NM:thầu BID_NM:số BID_NM:15) .

The result I receive is wrong order. Instead of finding Gói thầu số 15 , it finds the words individually, ie. gói , thầu or số on top result.

My Query method:

public static Query getQuery(String keyword) throws ParseException{
    try{
        return MultiFieldQueryParser.parse(Version.LUCENE_29, new     String  []{keyword}, new String[]{"NAME"}, new StandardAnalyzer(Version.LUCENE_29));
    }catch(ParseException e){
        keyword=MultiFieldQueryParser.escape(keyword);
        return MultiFieldQueryParser.parse(Version.LUCENE_29, new String[]{keyword}, new String[]{"NAME"}, new StandardAnalyzer(Version.LUCENE_29));
    }
}

Search:

IndexReader reader=null;
Query query=null;
Filter filter=null;
try{
  reader = IndexReader.open(directory, true);    // Read only
  IndexSearcher searcher = new IndexSearcher(reader);
  query=getQuery(keyword);
  System.out.println(query.toString());
  TopDocs topDocs = searcher.search(query, null, 10000, Sort.RELEVANCE);
  ScoreDoc[] hits = topDocs.scoreDocs;
} catch (Exception exc) {
     exc.printStackTrace();
} finally {
      if (reader != null) {
        try {
               reader.close();
                } catch (IOException e) {
                    e.printStackTrace();
                }

    }

}

Likely you should use phrase search. Here, for example one of possible tutorials on the subject http://www.avajava.com/tutorials/lessons/how-do-i-query-for-words-near-each-other-with-a-phrase-query.html

The key point is setSlop() method of the PhraseQuery which allows to specify maximum distance between words from the query in matched documents. Also if your documents require some specific phrases to be automatically recognized during indexing you may find the following tutorial useful.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM