简体   繁体   English

如何通过lucene索引和搜索得到顶级单词?

[英]How to get top words by lucene index and search?

I used lucene library to create index and search. 我使用lucene库来创建索引和搜索。 But now I want to get top 30 words are most of the words appearing in my texts. 但是现在我希望在我的文本中出现的大部分单词都能得到前30个单词。 What can I do? 我能做什么?

If you are using Lucene 4.0 or later, you can use the HighFreqTerms class, such as: 如果您使用的是Lucene 4.0或更高版本,则可以使用HighFreqTerms类,例如:

TermStats[] commonTerms = HighFreqTerms.getHighFreqTerms(reader, 30, "mytextfield");
for (TermStats commonTerm : commonTerms) {
    System.out.println(commonTerm.termtext.utf8ToString()); //Or whatever you need to do with it
}

From each TermStats object, you can get the frequencies, field name, and text. 从每个TermStats对象,您可以获取频率,字段名称和文本。

A quick search in SO got me this: Get highest frequency terms from Lucene index 在SO中快速搜索得到了这个: 从Lucene索引获得最高频率项

Would this work for you? 这对你有用吗? sounded like the exact same question.. 听起来像完全相同的问题..

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM