简体   繁体   中英

Extract term frequecny for each word in a lucene 5.2.1 index using java

How to extract term frequency of each word from a Lucene 5.2.1 index using java?

I have code that used to work for a previous Luecene version does not work anymore. I think most code on the Internet are for previous versions of Lucene.

You can get the term frequency of a given term from IndexReader.totalTermFreq , such as:

Term myTerm = new Term("contentfield", "myterm");
long totaltf = myReader.totalTermFreq(myTerm);

If you want to interate all the terms in the index and get the frequency of each, you can use MultiFields for that:

Fields fields = MultiFields.getFields(reader);
Iterator<String> fieldsIter = fields.iterator();
while (fieldsIter.hasNext()) {
    String fieldname = fieldsIter.next();
    TermsEnum terms = fields.terms(fieldname).iterator();
    BytesRef term;
    while ((term = terms.next()) != null) {
        System.out.println(fieldname + ":" + term.utf8ToString() + " ttf:" + terms.totalTermFreq());
        //Or whatever else you want to do with it...
    }
}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM