简体   繁体   English

Solr集群组件如何工作?

[英]How does solr clustering component work?

I was looking(in process of making my own) into solr's default clustering component for carrot2. 我正在寻找(正在制作自己的)solr的默认胡萝卜2集群组件。 In the clustering component class there are 2 methods where the clustering algorithms are called: 聚类组件类中,有两种方法被称为聚类算法:

  • in the overridden process method 在覆盖过程方法中

     SolrDocumentList solrDocList = SolrPluginUtils.docListToSolrDocumentList( results.docList, rb.req.getSearcher(), engine.getFieldsToLoad(rb.req),docIds); Object clusters = engine.cluster(rb.getQuery(), solrDocList, docIds, rb.req); rb.rsp.add("clusters", clusters); 
  • And once again in the finishStage method 并再次在finishStage方法中

     Map<SolrDocument,Integer> docIds = null; Object clusters = engine.cluster(rb.getQuery(), solrDocList, docIds, rb.req); rb.rsp.add("clusters", clusters); 

Now my question is the process method works not on the complete result query but on the shards and finish stage os when all the results have been aggregated, then why does we call the clustering algorithms twice and adding it to the result state?Am I missing something? 现在我的问题是,当所有结果都汇总后,处理方法不适用于完整的结果查询,而不适用于分片和完成阶段的操作系统,那么为什么我们要两次调用聚类算法并将其添加到结果状态中呢?什么东西

Clustering component code here . 在此处将组件代码聚类。

PS Please correct me if I am wrong. 附言:如果我错了,请纠正我。

Looks like a bug to me (and I see it's actually called twice in distributed mode). 对我来说似乎是个错误(我发现它在分布式模式下实际上被调用了两次)。 I'll look into this, see SOLR-10678 to track it. 我将对此进行调查,请参阅SOLR-10678进行跟踪。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM