简体   繁体   English

从apache lucene索引中搜索并按组明智地计算结果

[英]search from apache lucene index and count the result group wise

I am trying to search from lucene index but i want to filter this search . 我正在尝试从Lucene索引中搜索,但我想过滤此搜索。 there are two fields contents and and category .suppose i want to search in files which have "sports" and i also want to count to count how much files are in a and b category . 有两个字段content和and category。假设我要搜索具有“ sports”的文件,并且我还想计数一下a和b类别中有多少文件。 I am trying to achive this with following code . 我正在尝试通过以下代码来实现这一点。 But problem is that if there are millions of the records then it goes slow due to loop execution, suggest me another way to achieve the task. 但是问题是,如果有数百万条记录,那么由于循环执行,它会变慢,请提出另一种实现任务的方法。

try { File indexDir= new File("path of the file") 尝试{File indexDir = new File(“文件路径”)

           Directory directory = FSDirectory.open(indexDir);

                IndexSearcher searcher = new IndexSearcher(directory, true);
                int maxhits=1000000;
                QueryParser parser1 = new QueryParser(Version.LUCENE_36, "contents",

                  new StandardAnalyzer(Version.LUCENE_36));

          Query qu=parser1.parse("sport");

                TopDocs topDocs = searcher.search(, maxhits);
                ScoreDoc[] hits = topDocs.scoreDocs;


          len = hits.length;

       JOptionPane.showMessageDialog(null,"found times"+len);

                 int docId = 0;
                Document d;





 String category="";

int ctr=0,ctr1=0;

for ( i = 0; i<len; i++) {
docId = hits[i].doc;
d = searcher.doc(docId);
category= d.get(("category"));
if(category.equals("a"))
ctr++;
if(category.equals("b"))
ctr1++;


}

  JOptionPane.showMessageDialog("wprd found in category a times"+ctr);
   JOptionPane.showMessageDialog("wprd found in category b times"+ctr1);
  }

 catch(Exception ex)

 {

  ex.printStackTrace();
 }

You could just query for each category you are looking for and get totalHits . 您可以查询所需的每个类别并获得totalHits Better still would be to use a TotalHitCountCollector , instead of getting a TopDocs instance: 最好还是使用TotalHitCountCollector ,而不是获取TopDocs实例:

Query query = parser1.parser("+sport +category:a")
TotalHitCountCollector collector = new TotalHitCountCollector();
search.search(query, collector); 
ctr = collector.getTotalHits();
query = parser1.parser("+sport +category:b")
collector = new TotalHitCountCollector();
search.search(query, collector); 
ctr1 = collector.getTotalHits();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM