简体繁体 English

Solr索引，搜索词干

[英]Solr indexing, search stemming

原文 2012-02-23 01:18:28 0 1 solr

I have an issue where I have an index on a set of staff records. 我有一个问题，我有一组员工记录的索引。 The full text index is based on the person's name and position. 全文索引基于此人的姓名和职位。

I can search for a name like "john" without an issue, and part of a name like "anthon" and that works. 我可以在没有问题的情况下搜索像“john”这样的名字，并且像“anthon”这样的名字的一部分可以使用。

However, some names won't search correctly such as "anthony" returns no results, but "anth" returns all anthony's. 但是，有些名字不能正确搜索，例如“anthony”不会返回结果，但“anth”会返回所有的anthony's。 Like wise searching for "carly" returns nothing, but "car" does. 就像明智地寻找“carly”一样，没有任何回报，但“汽车”确实如此。

1 个解决方案

As Maurico commented, Stemming is not recommended for Person names. 正如Maurico评论的那样，不建议人名使用Stemming。
Stemming would cause a lot of unexpected results atleast for person names. 干扰会导致人们至少出现许多意想不到的结果。

Also, it would be interesting to check your schema.xml and the field analysis applied. 此外，检查schema.xml和应用的字段分析会很有趣。

This issue can occur if your are using different analysis at index and query time. 如果您在索引和查询时使用不同的分析，则可能会发生此问题。

From http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers 来自http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#Analyzers

Analyzers are components that pre-process input text at index time and/or at search time. 分析器是在索引时和/或搜索时预处理输入文本的组件。 It's important to use the same or similar analyzers that process text in a compatible manner at index and query time. 在索引和查询时使用以兼容方式处理文本的相同或类似分析器非常重要。 For example, if an indexing analyzer lowercases words, then the query analyzer should do the same to enable finding the indexed words. 例如，如果索引分析器小写单词，则查询分析器应该执行相同操作以查找索引单词。

From the example you mentioned, you seem to have Stemmer on the field at index time however the same does not seem to exist at query time analysis. 从您提到的示例中，您似乎在索引时在字段上有Stemmer，但在查询时分析中似乎不存在。