简体   繁体   English

在Solr组件准备方法中执行分布式搜索

[英]Performing a distributed search inside a Solr component prepare method

I'm writing a custom Solr component. 我正在编写一个自定义的Solr组件。 In the component's prepare method I'm executing a query given as a custom parameter (inside the req.params ). 在组件的prepare方法中,我正在执行作为自定义参数(在req.params )给出的查询。 I'm not running the q parameter query in the prepare method, but another input query defined in a custom parameter. 我没有在prepare方法中运行q参数查询,而是在自定义参数中定义了另一个输入查询。 I'm using the documents returned by that custom input query to do some preparations in the prepare method. 我正在使用该自定义输入查询返回的文档来进行prepare方法的一些准备工作。

The problem is that since my index is distributed into several shards, the documents returned by the custom query are only the ones residing on one of the shards. 问题在于,由于我的索引分布在多个分片中,因此自定义查询返回的文档只是驻留在其中一个分片上的文档。 In other words, the search performed in my prepare method is not distributed, and I'm getting partial results. 换句话说,在我的prepare方法中执行的搜索未分布,并且得到了部分结果。 This is more or less how I perform the search in my prepare method: 这或多或少是我在prepare方法中执行搜索的方式:

rb.req.getSearcher().getDocList(customQuery, null, null, offset, len, 0);

Is there a way to make a distributed search in the prepare method and get the matched documents from all the shards? 有没有办法在prepare方法中进行分布式搜索并从所有分片中获取匹配的文档?


EDIT: 编辑:

My current solution is to execute a query using Solrj roughly as follows: 我当前的解决方案是使用Solrj大致执行查询,如下所示:

SolrServer server = new HttpSolrServer(url);
SolrQuery request = new SolrQuery(customQuery);
NamedList queryResponse = server.query(request).getResponse();

Then I parse the response to get the content of the returned documents. 然后,我解析响应以获取返回文档的内容。 I don't like my solution for several reasons. 由于某些原因,我不喜欢我的解决方案。 One of the reasons is that I have to parse the response. 原因之一是我必须解析响应。 But the main reason is that I have to pass the Solr server url as a parameter. 但是主要原因是我必须将Solr服务器url作为参数传递。 I put the url in the solrconfig.xml file. 我将URL放在solrconfig.xml文件中。 Is it possible to somehow construct a SolrServer instance without explicitly stating the Solr server url (perhaps through ZooKeeper)? 是否可以以某种方式构造SolrServer实例而无需明确说明Solr服务器url(也许通过ZooKeeper)?

The Easy Way 简单的方法

Use CloudSolrServer to execute the distributed query. 使用CloudSolrServer执行分布式查询。 Feed it the Zookeeper url and the collection name (which are available in the response builder): 向其提供Zookeeper网址和集合名称(在响应构建器中可用):

CoreDescriptor coreDescriptor = rb.req.getCore().getCoreDescriptor();
String collectionName = coreDescriptor.getCloudDescriptor().getCollectionName();    
ZkController zkController = coreDescriptor.getCoreContainer().getZkController();    
String zookeeperUrl = zkController.getZkServerAddress();

CloudSolrServer server = new CloudSolrServer(zookeeperUrl);
server.setDefaultCollection(collectionName);
server.connect();

SolrRequest request = ... //initialize the solr request to execute the query
NamedList<Object> solrResponse = server.request(solrRequest);
// do whatever you like with the returned response;
server.shutdown();

The Right Way 正确的方式

Do not perform a distributed search inside the prepare method. 不要在prepare方法内执行分布式搜索。 Don't query the index in the prepare method. 不要在prepare方法中查询索引。 What you have to do is first decide at which stage of the execution you want your distributed query to be executed. 您需要做的是首先确定要在哪个阶段执行分布式查询。 The stages are STAGE_START , STAGE_PARSE_QUERY , STAGE_TOP_GROUPS , STAGE_EXECUTE_QUERY , STAGE_GET_FIELDS and STAGE_DONE . 这些阶段是STAGE_STARTSTAGE_PARSE_QUERYSTAGE_TOP_GROUPSSTAGE_EXECUTE_QUERYSTAGE_GET_FIELDSSTAGE_DONE If you need it to be executed between two of the stages, then create a new intermediate stage (such as EXECUTE_PREPARING_QUERY ). 如果需要在两个阶段之间执行它,则创建一个新的中间阶段(例如EXECUTE_PREPARING_QUERY )。

Override the distributedProcess method and implement it in such a way that if the current stage is your stage then set the right parameters for the shard request: 覆盖distributedProcess方法并以如下方式实现它:如果当前阶段是您的阶段,则为分片请求设置正确的参数:

@Override public int distributedProcess(ResponseBuilder rb) {
    ...
    if (rb.stage == MY_STAGE) {
       ShardRequest sreq = new ShardRequest();
       sreq.purpose = ShardRequest.PURPOSE_PRIVATE;
       sreq.params = new ModifiableSolrParams();
       // set the parameters for the shard request
       rb.addRequest(this, sreq);
    }
    ...
}

Now each shard is going to execute the request defined by the params you've set on its own core. 现在,每个分片将执行由您在其自己的核心上设置的参数定义的请求。 That's going to happen on the stage MY_STAGE . 这将在舞台MY_STAGE上发生。 You still have to handle the responses of the shards, combine them and use them. 您仍然必须处理分片的响应,将其组合并使用它们。 The right place to handle all those responses is inside the handleResponses method of the component. 处理所有这些响应的正确位置是组件的handleResponses方法内。 So override handleResponses and do whatever you need to do with the shard responses if you're in the right stage. 因此,如果处于正确的阶段,请覆盖handleResponses并对碎片响应做任何您需要做的事情。 You probably need to save them somewhere so you can reference them later in the finishStage method. 您可能需要将它们保存在某个位置,以便稍后可以在finishStage方法中引用它们。

@Override public void handleResponses(ResponseBuilder rb, ShardRequest sreq) {
   ...
   if (stage == MY_STAGE) {
      List<ShardResponse> responses = sreq.responses;
      for (ShardResponse response : responses) {
         //do something with the response, maybe save it somewhere
         rb.finished.remove(sreq);
      }
   }
   ...
}

Now you have to override the finishStage method and do whatever you need to do with the combined results. 现在,您必须重写finishStage方法,并对合并的结果进行任何处理。

@Override public void finishStage(ResponseBuilder rb) {
   ...
   if (rb.stage == MY_STAGE) {
      // do whatever you need to do with the results
   }
   ...
}

The important message is to use the response builder stages to control the execution flow of the component with relation to the other components. 重要的信息是使用响应构建器阶段来控制该组件相对于其他组件的执行流程。 You don't have to put the code in the prepare method if you want it to be executed before the execution of the actual query. 如果希望在执行实际查询之前执行代码,则不必将代码放入prepare方法中。 You just have to create or use a stage that's intermediate to STAGE_START and STAGE_EXECUTE_QUERY . 您只需要创建或使用介于STAGE_STARTSTAGE_EXECUTE_QUERY中间的阶段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM