简体   繁体   English

为什么我的版本不区分大小写的Lucene关键字分析器不起作用

[英]Why my version of case insensitive Lucene keyword analyzer is not working

I am trying to index documents for case insensitive search using KeywordTokenizer. 我正在尝试使用KeywordTokenizer为不区分大小写的搜索索引文档。

I have created a custom Analyzer that is supposed to do keyword tokenisation as well as convert all keywords to lowercase: 我创建了一个自定义分析器,该分析器应该执行关键字标记化并将所有关键字都转换为小写:

public class LowercasingKeywordAnalyzer extends Analyzer {

   @Override
   protected TokenStreamComponents createComponents(String fieldName) {
      KeywordTokenizer keywordTokenizer = new KeywordTokenizer();
      return new TokenStreamComponents(keywordTokenizer, new LowerCaseFilter(keywordTokenizer));
   }
}

Why does search return no results when I am submitting TermQuery with all criteria terms lowecased?? 当我提交所有条件词都用小写的TermQuery时,为什么搜索没有返回结果? Here is a unit test reproducing the issue: 这是重现问题的单元测试:

@Test
public void experiment() throws IOException, ParseException {
   Analyzer analyzer = new LowercasingKeywordAnalyzer();

   Directory directory = new RAMDirectory();
   IndexWriterConfig config = new IndexWriterConfig(analyzer);
   IndexWriter iwriter = new IndexWriter(directory, config);

   Document doc = new Document();
   String text = "This is the text to be indexed.";
   doc.add(new StringField("fieldname", text, Store.NO));

   iwriter.addDocument(doc);
   iwriter.close();

   // Now search the index:
   DirectoryReader ireader = DirectoryReader.open(directory);
   IndexSearcher isearcher = new IndexSearcher(ireader);

   //THE TEST PASSES WITH THE CASE SENSITIVE QUERY TERM, BUT DOES NOT PASS WITH LOWERCASED
   //Query query = new TermQuery(new Term("fieldname", "This is the text to be indexed."));
   Query query = new TermQuery(new Term("fieldname", "This is the text to be indexed.".toLowerCase()));


   ScoreDoc[] hits = isearcher.search(query, null, 1000).scoreDocs;
   assertEquals(1, hits.length);

   ireader.close();
   directory.close();
}

Please help me to identify what is wrong here? 请帮我找出问题所在吗?

NOTE: I am aware of Lucene QueryParsers as well as deprecation of some interfaces, please do not bother commenting on this. 注意:我知道Lucene QueryParsers以及某些接口的弃用,请不要对此发表评论。

StringField is not analyzed. 不分析StringField No analyzer you define will affect it. 您定义的任何分析器都不会影响它。 You can use a TextField instead, or a Field where you can define your own FieldType . 您可以改用TextField或可以定义自己的FieldTypeField Or just lowercase it before constructing the field and continue to use StringField . 或者只是在构造字段之前将其小写,然后继续使用StringField

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM