如何通过lucene索引和搜索得到顶级单词？

Question

I used lucene library to create index and search. 我使用lucene库来创建索引和搜索。 But now I want to get top 30 words are most of the words appearing in my texts. 但是现在我希望在我的文本中出现的大部分单词都能得到前30个单词。 What can I do? 我能做什么？

Answer 1

If you are using Lucene 4.0 or later, you can use the HighFreqTerms class, such as: 如果您使用的是Lucene 4.0或更高版本，则可以使用HighFreqTerms类，例如：

TermStats[] commonTerms = HighFreqTerms.getHighFreqTerms(reader, 30, "mytextfield");
for (TermStats commonTerm : commonTerms) {
    System.out.println(commonTerm.termtext.utf8ToString()); //Or whatever you need to do with it
}

From each TermStats object, you can get the frequencies, field name, and text. 从每个TermStats对象，您可以获取频率，字段名称和文本。

Answer 2

A quick search in SO got me this: Get highest frequency terms from Lucene index 在SO中快速搜索得到了这个：从Lucene索引获得最高频率项

Would this work for you? 这对你有用吗？ sounded like the exact same question.. 听起来像完全相同的问题..

如何通过lucene索引和搜索得到顶级单词？

问题描述

2 个解决方案

解决方案1
1 2013-10-03 19:12:43

解决方案2
0 2013-10-03 17:45:07

如何通过lucene索引和搜索得到顶级单词？

问题描述

2 个解决方案

解决方案1 1 2013-10-03 19:12:43

解决方案2 0 2013-10-03 17:45:07

解决方案1
1 2013-10-03 19:12:43

解决方案2
0 2013-10-03 17:45:07