简体   繁体   中英

Solr stop words not seem to work , stop words are removed while indexing but still it at query time the stopwords are not removed in proximity search

I am using solr 8.2.0. I am trying to configure proximity search in my solr but it doesnt seem to remove the stopwords in query.

    <fieldType name="psearch" class="solr.TextField" positionIncrementGap="100" multiValued="true">
  <analyzer type="index">
    <tokenizer class="solr.ClassicTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true"  words="stopwords.txt" /> 
  </analyzer>
  <analyzer type="query">
    <tokenizer class="solr.ClassicTokenizerFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" /> 
  </analyzer>
</fieldType>

I have mentioned the stopwords in stopwords.txt file in the directory, at the index time solr is removing the words as you can see in the picture: indexed terms

I also checked it in the analysis tab overthere the stopwords are being removed Analysis tab

And here is the field:

<field name="pSearchField" type="psearch" indexed="true" stored="true" multiValued="false" />
    <copyField source="example" dest="pSearchField"/>

Searching with proximity

And when I set the proximity to 1 or 2 or 3 it returns no result: result

This is a known problem with Solr 5 and up, since it no longer rewrites the position for each token when the stopfilter is invoked. This issue, with a few suggestions of how to fix it, is tracked in SOLR-6468 .

The easiest solution is to introduce a mapping char filter factory , but I'm skeptical to it changing characters internally in a string. (ie "to" => "" also affecting veto and not just to ). This can possible be handled with multiple PatternReplaceCharFilterFactories instead.

Another option shown in the thread for the ticket is to use a custom filter that rewrites the position data for each token:

package filters;

import java.io.IOException;
import java.util.Map;

import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.util.TokenFilterFactory;

public class RemoveTokenGapsFilterFactory extends TokenFilterFactory {

    public RemoveTokenGapsFilterFactory(Map<String, String> args) {
        super(args);
    }

    @Override
    public TokenStream create(TokenStream input) {
        RemoveTokenGapsFilter filter = new RemoveTokenGapsFilter(input);
        return filter;
    }

}

final class RemoveTokenGapsFilter extends TokenFilter {

    private final PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class);

    public RemoveTokenGapsFilter(TokenStream input) {
        super(input);
    }

    @Override
    public final boolean incrementToken() throws IOException {
        while (input.incrementToken()) {
            posIncrAtt.setPositionIncrement(1);
            return true;
        }
        return false;
    }
}

There currently is no perfect, built-in solution to this issue as far as I know.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM