简体繁体中英

Better search results using Lucene

原文 2010-02-22 10:26:03 3 3 java/ lucene

I've got a database with a lot of books in it. I've got fields like title, descriptions, authors etc.

I'm indexing title with a boost of 100f and description with a boost of 0.1f, both fields tokenized and stemmed.

I'm searching with a single input field, that searches in all available fields using a booleanquery joined with BooleanClause.Occur.SHOULD and containing a wildcardquery for each field. I also remove all "stopwords" from the query to start with.

The problem i'm having is when i search for the string without the quotes

"de wetenschap van het leven", after removing the stop words i get "wetenschap leven"

The Title query becomes " *wetenschap * *leven *", the description query the same, with a wrapping booleanquery joined with BooleanClause.Occur.SHOULD.

The following books are in the db

Wetenschappelijk denken. Een inleiding voor de medische en biomedische wetenschappen en voor de andere levenswetenschap.
De wetenschap van de aarde. Over een levende planeet
Atlas van de menselijke levensloop
De wetenschap van het leven. Over eenheid in biologische diversiteit

The book return in the first 4 books, that's good, but in this implementation we cut off at 3 and the rest is below a read more link. Just upping the cutoff is not an option

For me, the "De wetenschap van het leven. Over eenheid in biologische diversiteit" book matches the query "more" then the others (or so i feel), but i'm unable to find the correct index/search combination to make this work. Does anyone have an idea?

3 answers

A few suggestions:

Do not remove stop words - they seem to be an important part of your search query.
Do not use wildcards - search just for the words you need. I believe the best will be to use a PhraseQuery - eg "de wetenschap van het leven".
Do not search past sentence end. This is tougher - you may need to index each sentence separately.
Read Debugging Relevance Issues in Search - you will probably get other ideas there.

I think a SpanQuery (specifically a SpanNearQuery) might be what you need.

Given a document "a quick brown fox jumps over a lazy dog"

it can find a match for "brown fox " and "lazy dog". You can adjust the slop setting to adjust the distance between the two search query phrases/terms....in short, it gives you a lot of tools to tweak your search.

Also unfamiliar with dutch(?) language you might want to stem your queries if possible, and avoid leading wildcards - they are quite expensive and lead to lower precision and recall.

I improved the relevance by adding a phrase search for the entire string as well. This way we still get the "search in everything" behavior and the titles are a lot more relevant then the rest.

Sorting lucene search results

Different lucene search results using different search space size

How to display hibernate search results using QueryBuilder, Lucene Query & FullTextQuery?

How to get exact search results on top using Apache Lucene?

How many results to search for with Lucene?

Getting results of a search - lucene 4.4.0

Prefix search using lucene

Lucene Analyzer query and Search Results Relevance Score

Lucene seems to be caching search results - why?

Displaying lucene search results in java jTable

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Sorting lucene search results Different lucene search results using different search space size How to display hibernate search results using QueryBuilder, Lucene Query & FullTextQuery? How to get exact search results on top using Apache Lucene? How many results to search for with Lucene? Getting results of a search - lucene 4.4.0 Prefix search using lucene Lucene Analyzer query and Search Results Relevance Score Lucene seems to be caching search results - why? Displaying lucene search results in java jTable

Related Tags

Better search results using Lucene

Question

3 answers

solution1
2 ACCPTED 2010-02-23 08:41:09

solution2
1 2010-03-07 20:17:54

solution3
0 2010-02-24 09:22:24

Better search results using Lucene

Question

3 answers

solution1 2 ACCPTED 2010-02-23 08:41:09

solution2 1 2010-03-07 20:17:54

solution3 0 2010-02-24 09:22:24

solution1
2 ACCPTED 2010-02-23 08:41:09

solution2
1 2010-03-07 20:17:54

solution3
0 2010-02-24 09:22:24