簡體 English 中英

如何使用Lucene的shingleanalyzerwrapper + standardanalyzer + indexreader？

[英]How can I use lucene's shingleanalyzerwrapper + standardanalyzer + indexreader?

原文 2011-05-11 14:10:57 0 1 java/ lucene/ tokenize/ full-text-indexing/ frequency-analysis

希望您能幫我解決這個問題。 我打算做的事情：給一個正確的文本，我想計算每個沒有詞尾的詞干化標記ngram的頻率（換句話說，詞尾已經被刪除）。

這是這種情況：我正在使用ShingleAnalyzerWrapper + StandardAnalyzer使用IndexWriter為一些文本編制索引，並且當我向IndexWriter中添加文檔時（例如：indexwriter.addDocument（doc，analyzer）；其中分析器再次為ShingleAnalyzerWrapper + StandardAnalyzer）。

但是問題是：當我獲得頻率和術語一詞時，停用詞似乎被下划線所代替。

這是輸入：
字符串文本=“到我想鏈接到我”
字符串text2 =“ super by by hard easy”;

這是輸出：
期限： | freq：6
詞： _ |頻率：2
條款：_ hard | freq：1
字詞：_ i |頻率：2
條款：_鏈接|頻率：1
期限：簡單|頻率：1
詞：hard | freq：1
詞：難易|頻率：1
期限：i |頻率：2
期限：我想要|頻率：2
條款：鏈接|頻率：1
期限：超級|頻率：1
term：super _ | freq：1
條款：想要|頻率：2
字詞：想要_ |頻率：2

如果有任何不清楚的地方，請問我，以便我使自己更加清楚

謝謝您的幫助

1 個解決方案

請參閱http://www.lucidimagination.com/search/document/e5681676403a007b/can_i_omit_shinglefilter_s_filler_tokens了解一些解決方案。

在這種情況下，您似乎想要禁用stopfilter上的位置增量，因為您不想在停用詞所在的位置引入“空洞”，因此想假裝它們根本不存在。

Lucene 如何在 StandardAnalyzer 中關閉“toLowerCase”？

[英]Lucene how can i turn off “toLowerCase” in StandardAnalyzer?

如何在Lucene的StandardAnalyzer中使字母“ A”成為例外？

[英]How to make the letter “A” an exception in Lucene's StandardAnalyzer?

如何擴展Lucene的StandardAnalyzer以進行自定義特殊字符處理？

[英]How to extend Lucene's StandardAnalyzer for custom special character treatment?

如何使Lucene 5.5.0 StandardAnalyzer與Lucene 2.9.0 StandardAnalyzer對齊？

[英]How to make Lucene 5.5.0 StandardAnalyzer align with Lucene 2.9.0 StandardAnalyzer?

使用CharFilter和Lucene 4.3.0的StandardAnalyzer

[英]Using CharFilter with Lucene 4.3.0's StandardAnalyzer

如何使用 Lucene IndexReader 查找術語？

[英]How to seek to a term using a Lucene IndexReader?

IndexReader.getFieldNames Lucene 4

[英]IndexReader.getFieldNames Lucene 4

Lucene IndexReader提交不起作用

[英]Lucene IndexReader commit not working

StandardAnalyzer-Apache Lucene

[英]StandardAnalyzer - Apache Lucene

更改StandardAnalyzer Lucene

[英]Change StandardAnalyzer Lucene

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Lucene 如何在 StandardAnalyzer 中關閉“toLowerCase”？如何在Lucene的StandardAnalyzer中使字母“ A”成為例外？如何擴展Lucene的StandardAnalyzer以進行自定義特殊字符處理？如何使Lucene 5.5.0 StandardAnalyzer與Lucene 2.9.0 StandardAnalyzer對齊？使用CharFilter和Lucene 4.3.0的StandardAnalyzer 如何使用 Lucene IndexReader 查找術語？ IndexReader.getFieldNames Lucene 4 Lucene IndexReader提交不起作用 StandardAnalyzer-Apache Lucene 更改StandardAnalyzer Lucene

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM