简体   繁体   中英

WildcardQuery Lucene does not work properly

I am trying to use WildCardQuery:

    IndexSearcher indexSearcher = new IndexSearcher(ireader);
    Term term = new Term("phrase", QueryParser.escape(partOfPhrase) + "*");
    WildcardQuery wildcardQuery = new WildcardQuery(term);
    LOG.debug(partOfPhrase);
    Sort sort = new Sort(new SortField("freq", SortField.Type.LONG,true));
    ScoreDoc[] hits = indexSearcher.search(wildcardQuery, null, 10, sort).scoreDocs;

But when I insert "san " (without quotes), I want to get something like: "san diego", "san antonio" etc. But I am getting not only these results but also "sandals" (it must to be space after san), or juelz santana (I want to find sentences which start with san). How can I fix this issue?

EDIT Also, if I insert "san d", I have no results.

One possible way to solve that problem - is to use another analyzer, that will not split query and text in document by space.

One of the possible analyzer - is a KeywordAnalzer , that will use whole data as a single keyword

Essential part of the test:

Directory dir = new RAMDirectory();
Analyzer analyzer = new KeywordAnalyzer();
IndexWriterConfig iwc = new IndexWriterConfig(analyzer);
iwc.setOpenMode(IndexWriterConfig.OpenMode.CREATE);
IndexWriter writer = new IndexWriter(dir, iwc);

later on, I could add needed docs:

Document doc = new Document();
doc.add(new TextField("text", "san diego", Field.Store.YES));
writer.addDocument(doc);

And finally, search as you want:

IndexReader reader = DirectoryReader.open(dir);
IndexSearcher searcher = new IndexSearcher(reader);

Term term = new Term("text", QueryParser.escape("san ") + "*");
WildcardQuery wildcardQuery = new WildcardQuery(term);

My test is working properly, allowing me to retrieve san diego and san antonio and not take sandals . Take a look at full test here - https://github.com/MysterionRise/information-retrieval-adventure/blob/master/src/main/java/org/mystic/lucene/WildcardQueryWithSpace.java

For more information about analyzer itself - http://lucene.apache.org/core/4_10_2/analyzers-common/org/apache/lucene/analysis/core/KeywordAnalyzer.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM