简体   繁体   English

带有子元素的Solr文档?

[英]Solr documents with child elements?

Is it somehow possible to create a solr document that contains sub-elements? 是否有可能创建包含子元素的solr文档?

For example, how would I represent something like this: 例如,我将如何表示这样的事情:

<person first="Bob" last="Smith">
   <children>
      <child first="Little" last="Smith" />
      <child first="Junior" last="Smith" />
   </children>
</person>

What is the usual way to solve this problem? 解决这个问题的常用方法是什么?

As of Solr 4.7 and 4.8, Solr supports nested documents: 从Solr 4.7和4.8开始,Solr支持嵌套文档:

{
"id": "chapter1",
"title" : "Indexing Child Documents in JSON",
"content_type": "chapter",
"_childDocuments_": [
  {
    "id": "1-1",
    "content_type": "page",
    "text": "ho hum... this is page 1 of chapter 1"
  },
  {
    "id": "1-2",
    "content_type": "page",
    "text": "more text... this is page 2 of chapter 1"
  }
]
}

See the Solr release notes for more. 有关更多信息,请参阅Solr发行说明

You can model this in different ways, depending on your searching/faceting needs. 您可以根据您的搜索/分面需求以不同方式对此进行建模。 Usually you'll use multivalued or dynamic fields. 通常,您将使用多值或动态字段。 In the next examples I'll omit the field type, indexed and stored flags: 在下面的例子中,我将省略字段类型,索引和存储的标志:

<field name="first"/>
<field name="last"/>
<field name="child_first" multiValued="true"/>
<field name="child_last" multiValued="true"/>

It's up to you to correlate the children first names and last names. 由您来关联孩子的名字和姓氏。 Or you could just put both in a single field: 或者你可以把它们放在一个字段中:

<field name="first"/>
<field name="last"/>
<field name="child_first_and_last" multiValued="true"/>

Another one: 另一个:

<field name="first"/>
<field name="last"/>
<dynamicField name="child_first_*"/>
<dynamicField name="child_last_*"/>

Here you would store fields 'child_first_1', 'child_last_1', 'child_first_2', 'child_last_2', etc. Again it's up to you to correlate values, but at least you have an index. 在这里,您将存储字段'child_first_1','child_last_1','child_first_2','child_last_2'等。再次由您来关联值,但至少您有一个索引。 With some code you could make this transparent. 使用一些代码,您可以使其透明。

Bottom line: as the Solr wiki says: "Solr provides one table. Storing a set database tables in an index generally requires denormalizing some of the tables. Attempts to avoid denormalizing usually fail." 底线:正如Solr wiki所说:“Solr提供了一个表。在索引中存储集合数据库表通常需要对某些表进行非规范化。尝试避免非规范化通常会失败。” It's up to you to denormalize your data according to your search needs. 您可以根据自己的搜索需求对数据进行反规范化。

UPDATE: Since version 4.5 or so Solr supports nested documents directly: https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers 更新:从版本4.5开始,Solr直接支持嵌套文档: https//cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-BlockJoinQueryParsers

Having a separate fields for children leads to false positive matches. 为儿童设置单独的字段会导致误报。 Concatenated fields works in some meaning but it's really limited approach. 连接字段在某种意义上起作用,但它确实是有限的方法。 We have a lot of experience in the similar tasks blogged at http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html 我们在http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html上发布的类似任务中有很多经验。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM