簡體 English 中英

從Lucene索引中獲取最高頻率項

[英]Get highest frequency terms from Lucene index

原文 2010-05-12 19:00:40 5 2 java/ lucene/ full-text-search/ indexing/ frequency

我需要從幾個lucene索引中提取具有最高頻率的術語 ，以便將它們用於某些語義分析。

所以，我想得到可能排名前30的最常見的術語（仍然沒有決定閾值，我將分析結果）和他們的每個索引計數。 我知道我可能會失去一些精確度，因為可能會丟失一些副本，但是現在，讓我說我很好。

所以對於提出的解決方案，（不用說可能）速度並不重要，因為我會進行靜態分析，我會強調實現的簡單性 ，因為我不熟悉Lucene並且無法圍繞它的一些概念。。

我找不到任何類似的代碼示例，所以具體建議（代碼，偽代碼，代碼示例的鏈接......）感謝所有的建議！

謝謝！

2 個解決方案

一個非常簡單的方法是使用Luke 。 在“概覽”標簽上，有一個“顯示熱門條款”按鈕，可用於您需要的內容。

看看這個： http ： //sujitpal.blogspot.com/2009/02/summarization-with-lucene.html

此頁面中的類具有computeTopTermQuery方法，您應該可以輕松地對其進行多次索引的改進。

Lucene獲取最高頻率條款和原始文件

[英]Lucene Get Highest Frequency Terms and Origin Document

如何在Lucene中獲得多單詞詞的頻率？

[英]How to get frequency of multi-word terms in Lucene?

索引Lucene：如何獲取LongPoint字段的PointValues術語

[英]Index Lucene: How to get PointValues terms for LongPoint field

如何獲取lucene 4.4.0創建的索引目錄中的所有術語

[英]How to get all terms in index directory created by lucene 4.4.0

將術語頻率添加到Lucene索引中

[英]Add term frequency to lucene index

如何獲得Lucene 4中Lucene場的所有術語

[英]How to get all terms for a Lucene field in Lucene 4

如何從Lucene 5.x的字段中獲取n個第一項？

[英]How to get n first terms from field in Lucene 5.x?

Lucene：如何從索引中獲取LongField

[英]Lucene : How to get LongField from index

如何在Lucene中索引文檔中的所有術語？

[英]How to index all the terms in the document in Lucene?

Lucene：在索引時間覆蓋詞頻

[英]Lucene: Overwrite Term Frequency at Index Time

暫無

暫無

聲明:本站的技術帖子網頁，遵循CC BY-SA 4.0協議，如果您需要轉載，請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

相關問題 Lucene獲取最高頻率條款和原始文件如何在Lucene中獲得多單詞詞的頻率？索引Lucene：如何獲取LongPoint字段的PointValues術語如何獲取lucene 4.4.0創建的索引目錄中的所有術語將術語頻率添加到Lucene索引中如何獲得Lucene 4中Lucene場的所有術語如何從Lucene 5.x的字段中獲取n個第一項？ Lucene：如何從索引中獲取LongField 如何在Lucene中索引文檔中的所有術語？ Lucene：在索引時間覆蓋詞頻

相關標簽

粵ICP備18138465號 © 2020-2024 STACKOOM.COM