I am using solr 8.2.0. I am trying to configure proximity search in my solr but it doesnt seem to remove the stopwords in query.
<fieldType name="psearch" class="solr.TextField" positionIncrementGap="100" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.ClassicTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" />
</analyzer>
</fieldType>
I have mentioned the stopwords in stopwords.txt file in the directory, at the index time solr is removing the words as you can see in the picture: indexed terms
I also checked it in the analysis tab overthere the stopwords are being removed Analysis tab
And here is the field:
<field name="pSearchField" type="psearch" indexed="true" stored="true" multiValued="false" />
<copyField source="example" dest="pSearchField"/>
And when I set the proximity to 1 or 2 or 3 it returns no result: result
This is a known problem with Solr 5 and up, since it no longer rewrites the position for each token when the stopfilter is invoked. This issue, with a few suggestions of how to fix it, is tracked in SOLR-6468 .
The easiest solution is to introduce a mapping char filter factory , but I'm skeptical to it changing characters internally in a string. (ie "to" => ""
also affecting veto
and not just to
). This can possible be handled with multiple PatternReplaceCharFilterFactories instead.
Another option shown in the thread for the ticket is to use a custom filter that rewrites the position data for each token:
package filters;
import java.io.IOException;
import java.util.Map;
import org.apache.lucene.analysis.TokenFilter;
import org.apache.lucene.analysis.TokenStream;
import org.apache.lucene.analysis.tokenattributes.PositionIncrementAttribute;
import org.apache.lucene.analysis.util.TokenFilterFactory;
public class RemoveTokenGapsFilterFactory extends TokenFilterFactory {
public RemoveTokenGapsFilterFactory(Map<String, String> args) {
super(args);
}
@Override
public TokenStream create(TokenStream input) {
RemoveTokenGapsFilter filter = new RemoveTokenGapsFilter(input);
return filter;
}
}
final class RemoveTokenGapsFilter extends TokenFilter {
private final PositionIncrementAttribute posIncrAtt = addAttribute(PositionIncrementAttribute.class);
public RemoveTokenGapsFilter(TokenStream input) {
super(input);
}
@Override
public final boolean incrementToken() throws IOException {
while (input.incrementToken()) {
posIncrAtt.setPositionIncrement(1);
return true;
}
return false;
}
}
There currently is no perfect, built-in solution to this issue as far as I know.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.