简体   繁体   中英

How to Counting Word support and confidence with lucene

I want to use Lucene in Java applicaton to calculate word support and confidence. I have over 500 .txt document, and an ArrayList contains two term, term i and term j

The formula for counting Confidence

Dti-tj/Dti

Dti-tj: Total document contains term i,term j
Dti : Total document contains term i

The formula for counting Support

Dti-tj/D

Dti-tj = Total document contains term i,term j
D = Total Document in the collection

Is it possible using Lucene to search and counting the word? What class i have to use?

I would simply search for your two terms, term i and term j , and get your counts from the totalHits return from the search.

int docCount = indexReader.numDocs();
IndexSearcher searcher = new IndexSearcher(indexReader);

Query queryI = new TermQuery(new Term(fieldName, termI));
Query queryJ = new TermQuery(new Term(fieldName, termJ));

Query queryIJ = new BooleanQuery();
queryIJ.add(new BooleanClause(queryI, BooleanClause.Occur.SHOULD));
queryIJ.add(new BooleanClause(queryJ, BooleanClause.Occur.SHOULD));

int countI = searcher.search(nqueryI, maxDocs).totalHits;
int countIJ = searcher.search(, maxDocs).totalHits;

float confidence = (float)countIJ / (float)countI;
float support = (float)countIJ / (float)docCount;

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM