简体   繁体   English

在solr中查询具有不同字段的多个集合

[英]Query multiple collections with different fields in solr

Given the following (single core) query's: 给出以下(单核)查询:

http://localhost/solr/a/select?indent=true&q=*:*&rows=100&start=0&wt=json
http://localhost/solr/b/select?indent=true&q=*:*&rows=100&start=0&wt=json

The first query returns "numFound":40000" The second query returns "numFound":10000" 第一个查询返回“numFound”:40000“第二个查询返回”numFound“:10000”

I tried putting these together by: 我尝试将这些放在一起:

   http://localhost/solr/a/select?indent=true&shards=localhost/solr/a,localhost/solr/b&q=*:*&rows=100&start=0&wt=json

Now I get "numFound":50000". The only problem is "a" has more columns than "b". So the multiple collections request only returns the values of a. 现在我得到“numFound”:50000“。唯一的问题是”a“有比”b“更多的列。所以多个集合请求只返回a的值。

Is it possible to query multiple collections with different fields? 是否可以使用不同的字段查询多个集合? Or do they have to be the same? 或者他们必须是一样的吗? And how should I change my third url to get this result? 我应该如何更改我的第三个网址以获得此结果?

What you need is - what I call - a unification core . 你需要的是 - 我称之为 - 统一核心 That schema itself will have no content, it is only used as a sort of wrapper to unify those fields you want to display from both cores. 该模式本身不具有任何内容,它仅用作一种包装器来统一您希望从两个核心显示的那些字段。 In there you will need 你需要在那里

  • a schema.xml that wraps up all the fields that you want to have in your unified result 一个schema.xml,它包含您希望在统一结果中包含的所有字段
  • a query handler that combines the two different cores for you 一个查询处理程序,它为您组合了两个不同的核心

An important restriction beforehand taken from the Solr Wiki page about DistributedSearch 事先从Solr Wiki页面获取有关DistributedSearch的重要限制

Documents must have a unique key and the unique key must be stored (stored="true" in schema.xml) The unique key field must be unique across all shards. 文档必须具有唯一键,并且必须存储唯一键(schema.xml中存储=“true”)唯一键字段在所有分片中必须是唯一的。 If docs with duplicate unique keys are encountered, Solr will make an attempt to return valid results, but the behavior may be non-deterministic. 如果遇到具有重复唯一键的文档,Solr将尝试返回有效结果,但行为可能是不确定的。

As example, I have shard-1 with the fields id, title, description and shard-2 with the fields id, title, abstractText. 例如,我有shard-1 ,字段id,title,description和shard-2 ,字段id,title,abstractText。 So I have these schemas 所以我有这些架构

schema of shard-1 shard-1的模式

<schema name="shard-1" version="1.5">

  <fields>
    <field name="id"
          type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
          type="text" indexed="true" stored="true" multiValued="false" />
    <field name="description"
          type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

schema of shard-2 shard-2的模式

<schema name="shard-2" version="1.5">

  <fields>
    <field name="id" 
      type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="abstractText" 
      type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

To unify these schemas I create a third schema that I call shard-unification , which contains all four fields. 为了统一这些模式,我创建了第三个模式,我称之为shard-unification ,它包含所有四个字段。

<schema name="shard-unification" version="1.5">

  <fields>
    <field name="id" 
      type="int" indexed="true" stored="true" multiValued="false" />
    <field name="title" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="abstractText" 
      type="text" indexed="true" stored="true" multiValued="false" />
    <field name="description" 
      type="text" indexed="true" stored="true" multiValued="false" />
  </fields>
  <!-- type definition left out, have a look in github -->
</schema>

Now I need to make use of this combined schema, so I create a query handler in the solrconfig.xml of the solr-unification core 现在我需要使用这个组合模式,所以我在solr-unification核心的solrconfig.xml中创建了一个查询处理程序

<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="q.alt">*:*</str>
    <str name="qf">id title description abstractText</str>
    <str name="fl">*,score</str>
    <str name="mm">100%</str>
  </lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />

That's it. 而已。 Now some index-data is required in shard-1 and shard-2. 现在shard-1和shard-2中需要一些索引数据。 To query for a unified result, just query shard-unification with appropriate shards param. 要查询统一结果,只需使用适当的分片参数查询分片统一。

http://localhost/solr/shard-unification/select?q=*:*&rows=100&start=0&wt=json&shards=localhost/solr/shard-1,localhost/solr/shard-2

This will return you a result like 这会给你一个像这样的结果

{
  "responseHeader":{
    "status":0,
    "QTime":10},
  "response":{"numFound":2,"start":0,"maxScore":1.0,"docs":[
      {
        "id":1,
        "title":"title 1",
        "description":"description 1",
        "score":1.0},
      {
        "id":2,
        "title":"title 2",
        "abstractText":"abstract 2",
        "score":1.0}]
  }}

Fetch the origin shard of a document 获取文档的原始分片

If you want to fetch the originating shard into each document, you just need to specify [shard] within fl . 如果要将原始分片提取到每个文档中,只需在fl指定[shard] Either as parameter with the query or within the requesthandler's defaults, see below. 无论是作为查询的参数还是在requesthandler的默认值中,请参阅下文。 The brackets are mandatory, they will also be in the resulting response. 括号是强制性的,它们也将在最终的响应中。

<requestHandler name="standard" class="solr.StandardRequestHandler" default="true">
  <lst name="defaults">
    <str name="defType">edismax</str>
    <str name="q.alt">*:*</str>
    <str name="qf">id title description abstractText</str>
    <str name="fl">*,score,[shard]</str>
    <str name="mm">100%</str>
  </lst>
</requestHandler>
<queryParser name="edismax" class="org.apache.solr.search.ExtendedDismaxQParserPlugin" />

Working Sample 工作样本

If you want to see a running example, checkout my solrsample project on github and execute the ShardUnificationTest . 如果你想看一个正在运行的例子,请在github上检查我的solrsample项目执行ShardUnificationTest I have also included the shard-fetching by now. 我现在还包括了碎片。

Shards should be used in Solr 碎片应该在Solr中使用

When an index becomes too large to fit on a single system, or when a single query takes too long to execute 当索引变得太大而无法放在单个系统上时,或者单个查询执行时间太长时

so the number and names of the columns should always be the same. 所以列的数量和名称应始终相同。 This is specified in this document (where the previous quote also come from): http://wiki.apache.org/solr/DistributedSearch 这在本文档中指定(前面的引用也来自): http//wiki.apache.org/solr/DistributedSearch

If you leave your query as it is and make the two shards with the same fields this shoudl just work as expected. 如果你保持查询不变,并使两个分片具有相同的字段,这个shoudl就可以正常工作了。

If you want more info about how the shards work in SolrCould have a look at this docuemtn also: http://wiki.apache.org/solr/SolrCloud 如果您想了解更多关于分片如何在SolrCould中工作的信息,请查看此docuemtn: http ://wiki.apache.org/solr/SolrCloud

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM