简体   繁体   中英

How to configure tolkenizers with indexing and searching with Lucene and Nhibernate

This is a question for using Lucene via the NHibernate.Search namespace, which works in conjunction with Lucene.

I'm indexing a Title in the Index: Grey's Anatomy

Title : "Grey's Anatomy"

By using Luke, I see that that title is getting Tokenized into:

Title: anatomy
Title: grey

Now, I get a result if I search for:

"grey" or "grey's"

However, if I search for "greys" then I get nothing.

I would like "greys" to return a result. And I guess this could be an issue with any word with an apostrophe.

So, here are some questions:

  1. Am I right in thinking I could fix this issue either by changing something on the time of index (so, changing the tolkenizer..??) or changing it a query time (query parser?)
  2. If there is a solution, could someone provide a small code sample?

thanks

If you make a classic Term search using Lucene, then greys it's most likely not to show on the results, except that you make a nice tokenizing work when saving, so from where I see it, you have 2 choices or a 3rd beign a combination of them:

  1. Use a Stemmer for indexed data and query. Stemmers are fast, and you can always find an implementation of Porter's stemmer somewhere in Google . Problem is when you look for different languages.
  2. Use Fuzzy queries. Using a Fuzzy Query you can set the edit distance that you want to get "away" from the word being search. The thing is that because 2 words are "close" using an edition distance (ie, Lehvenstein) doesn't mean that they're the same, but the problem of Grey and Grey's and Greys should be solved with setting an edit distance of 2.

I think you will be able to find a decent implementation of the Porter Stemmer, which is nice right here .

Hope I can help!

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM