[英]How to disable LowerCaseFilter when using SnowballAnalyzer in Lucene 3.0.2?
I realise that 3.0.2 is an old version of Lucene but if I have Java code as follows: 我意识到3.0.2是Lucene的旧版本,但是如果我有如下Java代码:
int nGramLength = 3;
Set<String> stopWords = new Set<String>();
stopwords.add("the");
stopwords.add("and");
...
SnowballAnalyzer snowballAnalyzer = new SnowballAnalyzer(Version.LUCENE_30, "English", stopWords);
ShingleAnalyzerWrapper shingleAnalyzer = new ShingleAnalyzerWrapper(snowballAnalyzer, nGramLength);
Which will generate the frequency of ngrams from a particular a string of text without stop words, how can I disable the LowerCaseFilter which forms part of the SnowBallAnalyzer? 哪一个将从特定的不带停用词的文本字符串中产生ngram的频率,如何禁用构成SnowBallAnalyzer一部分的LowerCaseFilter? I want to preserve the case of the ngrams generated so that I can perform various counts according to the presence / absence of upper case characters in the ngrams.
我想保留生成的ngram的大小写,以便我可以根据ngram中是否存在大写字符进行各种计数。
I am something of a Lucene newbie. 我是Lucene新手。 And I should add that upgrading the version of Lucene is not an option here.
我还要补充一点,这里不是升级Lucene版本的选择。
The Snowball analyzer is a convenience class for using SnowballFilter
. Snowball分析器是使用
SnowballFilter
的便捷类。 LowerCaseFilter
is baked into the code. LowerCaseFilter
被烘焙到代码中。
Just copy the SnowballAnalyzer
source and remove line 103 streams.result = new LowerCaseFilter(streams.result);
只需复制
SnowballAnalyzer
源并删除第103行streams.result = new LowerCaseFilter(streams.result);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.