Java, Lucene : Sort search results with highest hit rate.

Question

I am working on a Spring-MVC application in which I am saving contents of user-data and using Lucene to index and search. Currently the functionality is working fine. Is it possible to sort the result with the highest matching probability first? I am currently saving paragraphs or more of text in indexes. Thank you.

Save code :

 Directory directory = org.apache.lucene.store.FSDirectory.open(path);
            IndexWriterConfig config = new IndexWriterConfig(new SimpleAnalyzer());
            IndexWriter indexWriter = new IndexWriter(directory, config);
            indexWriter.commit();
            org.apache.lucene.document.Document doc = new org.apache.lucene.document.Document();
            if (filePath != null) {
                File file = new File(filePath); // current directory
                doc.add(new TextField("path", file.getPath(), Field.Store.YES));
            }
            doc.add(new StringField("id", String.valueOf(objectId), Field.Store.YES));
FieldType fieldType = new FieldType(TextField.TYPE_STORED);
                fieldType.setTokenized(false);
                if(groupNotes!=null) {
                    doc.add(new Field("contents", text + "\n" + tagFileName+"\n"+String.valueOf(groupNotes.getNoteNumber()), fieldType));
                }else {
                    doc.add(new Field("contents", text + "\n" + tagFileName, fieldType));
                }

Search code :

File file = new File(path.toString());
                if ((file.isDirectory()) && (file.list().length > 0)) {
                    if(text.contains(" ")) {
                        String[] textArray = text.split(" ");
                        for(String str : textArray) {
                            Directory directory = FSDirectory.open(path);
                            IndexReader indexReader = DirectoryReader.open(directory);
                            IndexSearcher indexSearcher = new IndexSearcher(indexReader);
                            Query query = new WildcardQuery(new Term("contents","*"+str + "*"));
                            TopDocs topDocs = indexSearcher.search(query, 100);

                            for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
                                System.out.println("Score is "+scoreDoc.score);
                                org.apache.lucene.document.Document document = indexSearcher.doc(scoreDoc.doc);
                                objectIds.add(Integer.valueOf(document.get("id")));
                            }
                            indexSearcher.getIndexReader().close();
                            directory.close();
                        }
                    }
}
}

Thank you.

Answer 1

Your question is not a bit very clear to me so below are just guessed answers ,

There are methods in IndexSearcher which take org.apache.lucene.search.Sort as argument ,
public TopFieldDocs search(Query query, int n, Sort sort, boolean doDocScores, boolean doMaxScore) throws IOException OR
public TopFieldDocs search(Query query, int n, Sort sort) throws IOException

See if these methods solve your issue.
If you simply want to sort on the basis of scores then don't collect only document Ids but collect score too in a pojo that has that score field .
Collect all these pojos in some List then outside loop sort list on the basis of score.

for (ScoreDoc hit : hits) { //additional code pojo.setScore(hit.score);

    list.add(pojo);

   }

then outside for loop ,

list.sort((POJO p1, POJO p2) -> p2 .getScore().compareTo(p1.getScore()));

Java, Lucene : Sort search results with highest hit rate.

Question

1 answers

solution1
0 2017-07-05 04:27:24

Java, Lucene : Sort search results with highest hit rate.

Question

1 answers

solution1 0 2017-07-05 04:27:24

solution1
0 2017-07-05 04:27:24