在Lucene 3.0.2中使用SnowballAnalyzer时如何禁用LowerCaseFilter？

Question

I realise that 3.0.2 is an old version of Lucene but if I have Java code as follows: 我意识到3.0.2是Lucene的旧版本，但是如果我有如下Java代码：

int nGramLength = 3;
Set<String> stopWords = new Set<String>();
stopwords.add("the");
stopwords.add("and");
...
SnowballAnalyzer snowballAnalyzer = new SnowballAnalyzer(Version.LUCENE_30, "English", stopWords);                  
ShingleAnalyzerWrapper shingleAnalyzer = new ShingleAnalyzerWrapper(snowballAnalyzer, nGramLength);

Which will generate the frequency of ngrams from a particular a string of text without stop words, how can I disable the LowerCaseFilter which forms part of the SnowBallAnalyzer? 哪一个将从特定的不带停用词的文本字符串中产生ngram的频率，如何禁用构成SnowBallAnalyzer一部分的LowerCaseFilter？ I want to preserve the case of the ngrams generated so that I can perform various counts according to the presence / absence of upper case characters in the ngrams. 我想保留生成的ngram的大小写，以便我可以根据ngram中是否存在大写字符进行各种计数。

I am something of a Lucene newbie. 我是Lucene新手。 And I should add that upgrading the version of Lucene is not an option here. 我还要补充一点，这里不是升级Lucene版本的选择。

Answer 1

The Snowball analyzer is a convenience class for using SnowballFilter . Snowball分析器是使用SnowballFilter的便捷类。 LowerCaseFilter is baked into the code. LowerCaseFilter被烘焙到代码中。

Just copy the SnowballAnalyzer source and remove line 103 streams.result = new LowerCaseFilter(streams.result); 只需复制SnowballAnalyzer源并删除第103行streams.result = new LowerCaseFilter(streams.result);

在Lucene 3.0.2中使用SnowballAnalyzer时如何禁用LowerCaseFilter？

问题描述

1 个解决方案

解决方案1
0 已采纳 2014-11-10 14:25:40

在Lucene 3.0.2中使用SnowballAnalyzer时如何禁用LowerCaseFilter？

问题描述

1 个解决方案

解决方案1 0 已采纳 2014-11-10 14:25:40

解决方案1
0 已采纳 2014-11-10 14:25:40