简体   繁体   中英

Lucene : Search with partial words

I am working on integrating Lucene in our application. Lucene is currently working, for example when I am searching "Upload" and there is some text called "Upload" in a document, then it works, but when I search "Uplo", then it doesn't work. Any ideas?

Code :

  Directory directory = FSDirectory.open(path);
                IndexReader indexReader = DirectoryReader.open(directory);
                IndexSearcher indexSearcher = new IndexSearcher(indexReader);

                QueryParser queryParser = new QueryParser("contents", new SimpleAnalyzer());
                Query query = queryParser.parse(text);
                TopDocs topDocs = indexSearcher.search(query, 50);
                for (ScoreDoc scoreDoc : topDocs.scoreDocs) {
                    org.apache.lucene.document.Document document = indexSearcher.doc(scoreDoc.doc);
                    objectIds.add(Integer.valueOf(document.get("id")));
                    System.out.println("");
                    System.out.println("id " + document.get("id"));
                    System.out.println("content " + document.get("contents"));
                }
                return objectIds;

Thank you.

'Upload' might be ONE Token in your Lucene index where a Token would be the smallest entity non splittable further. If you want to match partial words like 'Uplo' then it is better to go for Lucene NGram Indexing . Note that if you go for NGram indexing you will have higher space requirements for your inverted index.

You can use wildcard searches.

"?" symbol for single character wildcard search and "*" symbol for Multiple character wildcard searches (0 or more characters).

example - "Uplo*"

Change

Query query = queryParser.parse(text);

To

 Query query = queryParser.parse("*"+text+"*");

Lucene supports single and multiple character wildcard searches within single terms (not within phrase queries).

To perform a single character wildcard search use the "?" symbol.

To perform a multiple character wildcard search use the "*" symbol.

The single character wildcard search looks for terms that match that with the single character replaced. For example, to search for "text" or "test" you can use the search:

te?t

Multiple character wildcard searches looks for 0 or more characters. For example, to search for test, tests or tester, you can use the search:

test*

You can also use the wildcard searches in the middle of a term.

te*t

Note: You cannot use a * or ? symbol as the first character of a search.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM