I want to use Lucene in Java applicaton to calculate word support and confidence. I have over 500 .txt document, and an ArrayList contains two term, term i and term j
The formula for counting Confidence
Dti-tj/Dti
Dti-tj: Total document contains term i,term j
Dti : Total document contains term i
The formula for counting Support
Dti-tj/D
Dti-tj = Total document contains term i,term j
D = Total Document in the collection
Is it possible using Lucene to search and counting the word? What class i have to use?
I would simply search for your two terms, term i
and term j
, and get your counts from the totalHits
return from the search.
int docCount = indexReader.numDocs();
IndexSearcher searcher = new IndexSearcher(indexReader);
Query queryI = new TermQuery(new Term(fieldName, termI));
Query queryJ = new TermQuery(new Term(fieldName, termJ));
Query queryIJ = new BooleanQuery();
queryIJ.add(new BooleanClause(queryI, BooleanClause.Occur.SHOULD));
queryIJ.add(new BooleanClause(queryJ, BooleanClause.Occur.SHOULD));
int countI = searcher.search(nqueryI, maxDocs).totalHits;
int countIJ = searcher.search(, maxDocs).totalHits;
float confidence = (float)countIJ / (float)countI;
float support = (float)countIJ / (float)docCount;
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.