简体   繁体   中英

Solr Spellcheck for Multi Word Phrases

I have a problem with solr spellcheck suggestions for multi word phrases. With the query for 'red chillies'

q=red+chillies&wt=xml&indent=true&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true

I get

<lst name="suggestions">
  <lst name="chillies">
    <int name="numFound">2</int>
    <int name="startOffset">4</int>
    <int name="endOffset">12</int>
    <int name="origFreq">0</int>
    <arr name="suggestion">
      <lst><str name="word">chiller</str><int name="freq">4</int></lst>
      <lst><str name="word">challis</str><int name="freq">2</int></lst>
    </arr>
  </lst>
  <bool name="correctlySpelled">false</bool>
  <str name="collation">red chiller</str>
</lst>

The problem is, even though 'chiller' has 4 results in index, 'red chiller' has none. So we end up suggesting a phrase with 0 result.

What can I do to make spellcheck work on the whole phrase only? I tried using KeywordTokenizerFactory in query:

<fieldType name="text_spell" class="solr.TextField" positionIncrementGap="100">
  <analyzer type="index">
    <tokenizer class="solr.StandardTokenizerFactory" />
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> 
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory" />
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" />
    <filter class="solr.LowerCaseFilterFactory" />
  </analyzer>
</fieldType>

And I also tried adding

<str name="sp.query.extendedResults">false</str>

within

<lst name="spellchecker">

in solrconfig.xml.

But neither seems to make a difference.

What would be the best way to make spellcheck only give collation that have results for the whole phrase? Thanks!

The real issue here is that you need to specify the spellcheck.collateParam.q.op=AND and also (optionally) spellcheck.collateParam.mm=100% These params enforce the collate queries executed correctly.

You can read more about this on the solr docs

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM