How to extract term frequency of each word from a Lucene 5.2.1 index using java?
I have code that used to work for a previous Luecene version does not work anymore. I think most code on the Internet are for previous versions of Lucene.
You can get the term frequency of a given term from IndexReader.totalTermFreq
, such as:
Term myTerm = new Term("contentfield", "myterm");
long totaltf = myReader.totalTermFreq(myTerm);
If you want to interate all the terms in the index and get the frequency of each, you can use MultiFields
for that:
Fields fields = MultiFields.getFields(reader);
Iterator<String> fieldsIter = fields.iterator();
while (fieldsIter.hasNext()) {
String fieldname = fieldsIter.next();
TermsEnum terms = fields.terms(fieldname).iterator();
BytesRef term;
while ((term = terms.next()) != null) {
System.out.println(fieldname + ":" + term.utf8ToString() + " ttf:" + terms.totalTermFreq());
//Or whatever else you want to do with it...
}
}
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.