Solr stop words replaced with _ symbol

Question

I have problems with solr stopwords in my autosuggest. All stopwords was replaced by _ symbol.

For example I have text "the simple text in" in field "deal_title". When I try to search word "simple" solr show me next result "_ simple text _" but I expect "simple text".

Could someone explain me why this works in such way and how to fix it ? Here is part of my schema.xml

<fieldType class="solr.TextField" name="text_auto">
    <analyzer type="index">
        <charFilter class="solr.HTMLStripCharFilterFactory"/>
        <tokenizer class="solr.StandardTokenizerFactory"/>
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
        <filter class="solr.LowerCaseFilterFactory"/>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
        <filter class="solr.ShingleFilterFactory" maxShingleSize="3" outputUnigrams="true" outputUnigramsIfNoShingles="false" /> 
    </analyzer> 
    <analyzer type="query">
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> 
        <tokenizer class="solr.StandardTokenizerFactory"/> 
        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
    </analyzer>
</fieldType>

<field name="deal_title" type="text_auto" indexed="true" stored="true" required="false" multiValued="false"/>

<fieldType name="text_general" class="solr.TextField">
  <analyzer type="index">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.KeywordTokenizerFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="false"/>
    <filter class="solr.LowerCaseFilterFactory"/>
  </analyzer>
</fieldType>

Answer 1

My solution to this in Solr 6.3 (where enablePositionIncrements="false" isn't possible anymore) was to:

remove stopwords
shingle with fillerToken="" (which removes the _ )
remove leading and trailing spaced

remove duplicates

 <filter class="solr.StopFilterFactory" format="snowball" words="lang/stopwords_de.txt" ignoreCase="true"/> <filter class="solr.ShingleFilterFactory" fillerToken=""/> <filter class="solr.PatternReplaceFilterFactory" pattern="(^ | $)" replacement=""/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>

Answer 2

要解决此问题，您需要在<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" enablePositionIncrements="false" />使用<filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true" enablePositionIncrements="false" />和<luceneMatchVersion>4.3</luceneMatchVersion>

Solr stop words replaced with _ symbol

Question

2 answers

solution1
2 2017-01-10 10:22:55

solution2
0 2015-02-12 13:44:03

Solr stop words replaced with _ symbol

Question

2 answers

solution1 2 2017-01-10 10:22:55

solution2 0 2015-02-12 13:44:03

solution1
2 2017-01-10 10:22:55

solution2
0 2015-02-12 13:44:03