简体   繁体   中英

How to improve proximity search in solr

当我在solr中搜索company时,结果应包含类似的结果,例如公司,comp-any和company.How如何使用solr来获得该结果。

For the use case you provided, you can use n-grams.

<analyzer>
  <tokenizer class="solr.StandardTokenizerFactory"/>
  <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="7"/>
</analyzer>

This filter breaks the tokens in parts of the specified sizes, like, for the word "company", will produce the following tokens: "com", "omp", "mpa", "pan", "any", "comp", "ompa", "mpan", "pany", "compa", "ompan", "mpany", "compan", "ompany", "company"

TAKE CARE This filter may degrade performance and makes your index grows exponentially, and possibly runs Solr out of memory depending on the size of the fields you're using it (ie if you use it for content extraction). So, choose wisely the field to use it :)

Here are some useful information with examples about it: https://lucene.apache.org/solr/guide/6_6/filter-descriptions.html#FilterDescriptions-N-GramFilter

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM