简体   繁体   English

使用启用了词干分析的Lucene搜索

[英]Searching with Lucene with stemming enabled

Suppose I store a set of strings (each document in Lucene would be a single word), and then given an input word W, I would like to retrieve all the document that not only match word W but also those documents whose stemmed version also matches W. 假设我存储了一组字符串(Lucene中的每个文档将是一个单词),然后给定输入单词W,我想检索不仅匹配单词W的所有文档,而且检索词干版本也匹配的那些文档W.

Also, suppose a input a word W, I would want to take care of the case where there is a document that matches the stemmed version of the word W as well. 另外,假设输入的单词为W,那么我想考虑是否存在与单词W的词干版本匹配的文档。

Would writing my own custom analyzer and returning a PorterStemFilter suffice? 是否编写我自己的自定义分析器并返回一个PorterStemFilter就足够了? Do I need to just write this class and reference it as the analyzer in the code? 我是否需要编写此类并将其作为代码中的分析器引用?

Writing a custom Analyzer that has a stemmer in the analyzer chain should suffice. 编写在分析器链中具有茎的自定义分析器就足够了。

Here is the sample code that uses PorterStemFilter in Lucene 4.1 这是在Lucene 4.1中使用PorterStemFilter的示例代码

 class MyAnalyzer extends Analyzer {
  @Override
  protected TokenStreamComponents createComponents(String fieldName, Reader reader) {
    Tokenizer source = new LowerCaseTokenizer(version, reader);
    return new TokenStreamComponents(source, new PorterStemFilter(source));
  }
}

Please note that you MUST use the same custom Analyzer while querying which is used for indexing as well. 请注意,查询时必须使用相同的自定义分析器,该自定义分析器也应用于索引。

You may find the sample code for your version of Lucene in the corresponding PorterStemFilter documentation. 您可以在相应的PorterStemFilter文档中找到适用于您的Lucene版本的示例代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM