简体   繁体   English

在Lucene 3.0.2中使用SnowballAnalyzer时如何禁用LowerCaseFilter?

[英]How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2?

I realise that 3.0.2 is an old version of Lucene but if I have Java code as follows: 我意识到3.0.2是Lucene的旧版本,但是如果我有如下Java代码:

int nGramLength = 3;
Set<String> stopWords = new Set<String>();
stopwords.add("the");
stopwords.add("and");
...
SnowballAnalyzer snowballAnalyzer = new SnowballAnalyzer(Version.LUCENE_30, "English", stopWords);                  
ShingleAnalyzerWrapper shingleAnalyzer = new ShingleAnalyzerWrapper(snowballAnalyzer, nGramLength);

Which will generate the frequency of ngrams from a particular a string of text without stop words, how can I disable the LowerCaseFilter which forms part of the SnowBallAnalyzer? 哪一个将从特定的不带停用词的文本字符串中产生ngram的频率,如何禁用构成SnowBallAnalyzer一部分的LowerCaseFilter? I want to preserve the case of the ngrams generated so that I can perform various counts according to the presence / absence of upper case characters in the ngrams. 我想保留生成的ngram的大小写,以便我可以根据ngram中是否存在大写字符进行各种计数。

I am something of a Lucene newbie. 我是Lucene新手。 And I should add that upgrading the version of Lucene is not an option here. 我还要补充一点,这里不是升级Lucene版本的选择。

The Snowball analyzer is a convenience class for using SnowballFilter . Snowball分析器是使用SnowballFilter的便捷类。 LowerCaseFilter is baked into the code. LowerCaseFilter被烘焙到代码中。

Just copy the SnowballAnalyzer source and remove line 103 streams.result = new LowerCaseFilter(streams.result); 只需复制SnowballAnalyzer源并删除第103行streams.result = new LowerCaseFilter(streams.result);

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM