简体   繁体   English

SOLR词干和停用词

[英]SOLR stemming and stopwords

In SOLR 3.5 text field type the StopFilterFactory is listed before the PorterStemFilterFactory. 在SOLR 3.5文本字段类型中,StopFilterFactory列在PorterStemFilterFactory之前。

does this mean that if I wanted to stop for example "game" and "games" I would have to add both to stopwords? 这是否意味着,如果我想停止例如“游戏”和“游戏”,我将不得不添加两个停顿词?

if so would moving the StopFilterFactory after the PorterStemFilterFactory, and adding just "game" to stopwords cause occurrences of both "game" and "games" to be stripped? 如果是这样会在PorterStemFilterFactory之后移动StopFilterFactory,并且只是将“游戏”添加到停用词会导致“游戏”和“游戏”的出现被剥夺?

I guess the true question is what is the best way to do this, and do I need to add all variations of the word to stopwords? 我想真正的问题是最好的方法是什么,我是否需要将这个词的所有变体添加到停用词?

PorterStemFilterFactory provides agressive stemming and having is before stop word filter may not cause proper stop word removal as the root may be different then the one you are trying to stop. PorterStemFilterFactory提供了PorterStemFilterFactory词干,并且在停止词过滤器之前可能不会导致正确的停止词删除,因为根可能与您尝试停止的词不同。
You can add handling for plurals only you can use solr.EnglishMinimalStemFilterFactory before the stop word filter. 您可以添加复数处理,只能在停用词过滤器之前使用solr.EnglishMinimalStemFilterFactory
This would handle the plurals and the stop words would then need to handle just the singular version. 这将处理复数,然后停止词将需要处理单数形式。
You can then add the PorterStemFilterFactory filter to handle the proper stemming. 然后,您可以添加PorterStemFilterFactory过滤器来处理正确的词干。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM