[英]How to get top words by lucene index and search?
I used lucene library to create index and search. 我使用lucene库来创建索引和搜索。 But now I want to get top 30 words are most of the words appearing in my texts.
但是现在我希望在我的文本中出现的大部分单词都能得到前30个单词。 What can I do?
我能做什么?
If you are using Lucene 4.0 or later, you can use the HighFreqTerms
class, such as: 如果您使用的是Lucene 4.0或更高版本,则可以使用
HighFreqTerms
类,例如:
TermStats[] commonTerms = HighFreqTerms.getHighFreqTerms(reader, 30, "mytextfield");
for (TermStats commonTerm : commonTerms) {
System.out.println(commonTerm.termtext.utf8ToString()); //Or whatever you need to do with it
}
From each TermStats
object, you can get the frequencies, field name, and text. 从每个
TermStats
对象,您可以获取频率,字段名称和文本。
A quick search in SO got me this: Get highest frequency terms from Lucene index 在SO中快速搜索得到了这个: 从Lucene索引获得最高频率项
Would this work for you? 这对你有用吗? sounded like the exact same question..
听起来像完全相同的问题..
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.