简体   繁体   中英

Is there any lucene/solr spell checker which can handle space insertions/removal typos?

As far as I know almost all do spell checking based on single query term and are unable to do changes on whole input query to increase coverage in corpra. I have one in lingpipe but it is very expensive... http://alias-i.com/lingpipe/demos/tutorial/querySpellChecker/read-me.html

So my question what is the best Apache alternative to lingpipe like spell checker?

The spellcheckers in lucene treat whitespace like any other character. So in general you can feed them your query logs or whatever, and spellcheck/autocomplete full queries.

For lucene this should just work, for solr you need to ensure the QueryConverter doesn't split up your terms... see https://issues.apache.org/jira/browse/SOLR-3143

On the other hand, these suggesters currently work on the whole input, so if you want to suggest queries that have never been searched before, instead you want something that maybe only takes the last N words of context similar to http://googleblog.blogspot.com/2011/04/more-predictions-in-autocomplete.html .

I'm hoping we will provide that style of suggester soon also as an alternative, possibly under https://issues.apache.org/jira/browse/LUCENE-3842 .

But keep in mind, thats not suitable for all purposes, so I think its going to likely just be an option. For example, if you are doing e-commerce there is no sense is suggesting products you don't sell :)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM