简体   繁体   中英

Solr4 - spellcheck issue with multi terms

I'm getting trouble with spell check.

If I send a request with "wrd", spellcheck give me suggestion I want : "word". But if I send a request with multiple terms, like "wrd black", spellcheck returns a correctlySpelled to true. I want spellcheck suggestion : "word black".

Note that if I send a request with "wrd blck", spellcheck gives me suggestions I want ("word black").

I don't think this is a normal behaviour, but I can't find where is the problem.

Here is my solrconfig.xml :

<config>


  <requestHandler name="standard" class="solr.StandardRequestHandler" default="true">

     <lst name="defaults">
        <str name="spellcheck.dictionary">default</str> 
        <str name="spellcheck">on</str>
        <str name="spellcheck.extendedResults">true</str> 
        <str name="spellcheck.count">10</str>
        <str name="spellcheck.maxResultsForSuggest">5</str> 
        <str name="spellcheck.collate">true</str>
        <str name="spellcheck.collateExtendedResults">true</str> 
        <str name="spellcheck.maxCollationTries">15</str>
        <str name="spellcheck.maxCollations">10</str> 
    </lst>
 <arr name="last-components">
     <str>spellcheck</str>
 </arr>

  </requestHandler>


<searchComponent name="spellcheck" class="solr.SpellCheckComponent">
   <str name="queryAnalyzerFieldType">textSpell</str> 
   <lst name="spellchecker">
       <str name="name">default</str>
       <str name="field">spell</str>
       <str name="spellcheckIndexDir">./spellchecker</str>
       <str name="buildOnOptimize">true</str>
       <str name="buildOnCommit">true</str>
       <float name="thresholdTokenFrequency">.01</float>
   </lst>
</searchComponent>


</config>

and in my schema.xml :

 <field name="spell" type="textSpell" indexed="true" stored="false" multiValued="true" />
    <copyField source="attr_*" dest="spell" />
    <fieldType name="textSpell" class="solr.TextField" positionIncrementGap="100" omitNorms="true">
      <analyzer>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.StandardFilterFactory" />
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt"/>
      </analyzer>
    </fieldType>

Anyone has any ideas ?

There seems to be a bug when one of the query terms is spelled correctly and spellcheck configuration having maxCollationTries >1, i can not tell for sure its a bug , i am going through code to find out this.

Remove this config from your default params of your handler

<str name="spellcheck.maxCollationTries">15</str>

You can use this as query param as spellcheck.maxCollationTries=15 and try.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM