简体   繁体   English

一个solr集合下是否可能具有包含集合模式的字段子集的文档?

[英]Is it possible to have documents with a subset of fields of the collection's schema under one solr collection?

We have 4 different data sets and want to perform faceted search on them. 我们有4个不同的数据集,并希望对其进行多面搜索。 We are currently using SolrCloud and flattened these data sets before indexing them to Solr. 我们目前正在使用SolrCloud,并在将它们索引到Solr之前将这些数据集展平。 Even though we have relational data, our primary goal is faceted search and Solr seemed like the right option. 即使我们有关系数据,我们的主要目标还是分面搜索,而Solr似乎是正确的选择。

Rough structure of our data: 我们数据的粗略结构:

Dataset1(col1, col2, col3,col4)
Dataset2(col1,col6,col7,col8)
Dataset3(col6,col9,col10)

Flattened dataset: dataset(col1,col2,col3,col4,col6,col7,col8,col9,col10) . 展平的数据集: dataset(col1,col2,col3,col4,col6,col7,col8,col9,col10)

In the end, we flattened them to have one common structure and have nulls where values do not exist. 最后,我们将它们展平为具有一个通用结构,并在不存在值的地方使用null。 So far Solr works great. 到目前为止,Solr运作良好。

Problem: Now we have additional data sets coming in and each of them have about 50-60 columns. 问题:现在我们有了其他数据集,每个数据集都有大约50-60列。 Technically, I can still flatten these too, but I don't think it is a good idea. 从技术上讲,我仍然可以使这些变平,但是我认为这不是一个好主意。 I know that I can have different collections with different schemas for each data set. 我知道我可以为每个数据集使用不同的架构使用不同的集合。 But, we perform group by's on these documents so we need one schema. 但是,我们在这些文档上执行分组依据,因此我们需要一个模式。

Is there any way to maintain documents with a subset of fields of the schema under one collection without flattening them? 有什么方法可以在一个集合中维护带有模式字段子集的文档,而无需对其进行展平? If not, is there a better solution for this problem? 如果不是,是否有更好的解决方案?

For instance: 例如:

DocA(field1, field2) DocB(field3,field4). 
Schema(field1, field2, field3, field4).

Can we have DocA and DocB under one collection with the above schema? 使用上述架构,我们可以在一个集合下拥有DocA和DocB吗?

Our backend is on top of Cloudera Hadoop (CDH4.6 and 5.2) distribution and we can choose any tool that belongs to the Hadoop ecosystem for a possible solution. 我们的后端位于Cloudera Hadoop(CDH4.6和5.2)分发之上,我们可以选择属于Hadoop生态系统的任何工具作为可能的解决方案。

Of course you can, they only need a different uniquekey for each document. 当然可以,他们只需要为每个文档使用不同的唯一键即可。 If you have defined a fixed solr schema, maybe dynamicfields can help you. 如果定义了固定的Solr模式,则动态字段可能会为您提供帮助。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM