简体繁体中英

How can I use lucene's shingleanalyzerwrapper + standardanalyzer + indexreader?

原文 2011-05-11 14:10:57 8 1 java/ lucene/ tokenize/ full-text-indexing/ frequency-analysis

I hope you can help me with this problem. What I intend to do: Given a right text, I want to count the frequencies for every stemmized token ngrams without the stopwords(in other words, the stopwords are already removed).

This is the situation: I am indexing some texts with IndexWriter using ShingleAnalyzerWrapper + StandardAnalyzer and when I add a document to IndexWriter(like this: indexwriter.addDocument(doc, analyzer); where analyzer is again, ShingleAnalyzerWrapper + StandardAnalyzer ).

But the problem is: When I get the term frequencies and the terms, the stopwords seem to be substituted by underlines.

This is the input:
String text = "to i want to to i want to linked";
String text2 = "super by by hard easy ";

If anything was unclear, please ask me so I try to make myself more clear

Thanks for the help

1 answers

please see http://www.lucidimagination.com/search/document/e5681676403a007b/can_i_omit_shinglefilter_s_filler_tokens for some solutions.

In this case it seems like you probably want to disable position increments on your stopfilter, as you don't want to introduce a "hole" where the stopword was, you want to pretend like they never existed.

Lucene how can i turn off “toLowerCase” in StandardAnalyzer?

How to make the letter “A” an exception in Lucene's StandardAnalyzer?

How to extend Lucene's StandardAnalyzer for custom special character treatment?

How to make Lucene 5.5.0 StandardAnalyzer align with Lucene 2.9.0 StandardAnalyzer?

Using CharFilter with Lucene 4.3.0's StandardAnalyzer

How to seek to a term using a Lucene IndexReader?

IndexReader.getFieldNames Lucene 4

Lucene IndexReader commit not working

StandardAnalyzer - Apache Lucene

Change StandardAnalyzer Lucene

暂无

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

Related Question Lucene how can i turn off “toLowerCase” in StandardAnalyzer? How to make the letter “A” an exception in Lucene's StandardAnalyzer? How to extend Lucene's StandardAnalyzer for custom special character treatment? How to make Lucene 5.5.0 StandardAnalyzer align with Lucene 2.9.0 StandardAnalyzer? Using CharFilter with Lucene 4.3.0's StandardAnalyzer How to seek to a term using a Lucene IndexReader? IndexReader.getFieldNames Lucene 4 Lucene IndexReader commit not working StandardAnalyzer - Apache Lucene Change StandardAnalyzer Lucene

Related Tags

How can I use lucene's shingleanalyzerwrapper + standardanalyzer + indexreader?

Question

1 answers

solution1 0 ACCPTED 2011-05-12 15:27:17

solution1
0 ACCPTED 2011-05-12 15:27:17