简体   繁体   中英

Can Solr/Lucene do Fuzzy Field Collapsing?

Edit

Can Solr do fuzzy field collapsing? IE collapsing fields that have similar values, rather than identical ones?

I'd assumed that it could, but now I'm not sure, which makes my original question below invalid.

Original Question

For a large given set of values I need to decide which is the most prevalent. The set of all values will change over time, and so I can expect that the output may change over time too.

I gather Solr can do "field collapsing" to group results by a given field, with a tolerance of similarity. Would it be possible, neigh even appropriate, to use Solr solely to collapse fields, to derive the most common value? We use Solr in other parts of the business, and it would be good to leverage existing code rather than home-brewing a custom solution.

No, solr does not support fuzzy collapsing. (at least not based on what is documented on the wiki)

Solr 4.0 supports group.func which allows you to group results based on the result of a FunctionQuery , so it's possible that at some point in time a function could be created to get you approximately what you want, but none of the existing functions will do what you want.

However, Solr does support result clustering , which will maybe work for your use-case. Clustering is done with Carrot 2 . If you limit the fields used by carrot to a single field, you may get a similar result to "fuzzy clustering", but you have far less control over what carrot does than you do with field collapsing.

For a normal document you might want all your fields analyzed by carrot, eg:

carrot.title=my_title&carrot.snippet=my_title,my_description

But if you have, for example, a manufacturer field with slight variations of spelling or punctuation, it might work to only give carrot a single field for both title and snippet :

carrot.title=manufacturer&carrot.snippet=manufacturer

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM