简体   繁体   English

Solr单指数与Solr多核

[英]Solr single index vs Solr multi core

I need some assitance on deciding creating a single index in a single Solr instance vs creating multiple cores in a single Solr instance, each core servicing an index. 我需要一些帮助来决定在单个Solr实例中创建单个索引与在单个Solr实例中创建多个核心,每个核心为索引提供服务。 My understanding is, a single index in solr is usually implemented to index one type of document. 我的理解是,solr中的单个索引通常用于索引一种类型的文档。 What is the best practice when you have different document types? 当您有不同的文档类型时,最佳做法是什么? For an example, if you want to index details of an invoice transaction, you could create a schema with fields for an invoice transaction document as follows; 例如,如果要索引发票交易的详细信息,可以创建一个包含发票交易凭证字段的模式,如下所示;

  • invoiceDate 发票日期
  • dueDate 截止日期
  • invoiceSummary 帐单摘要
  • billingContact 帐单联系人
  • invoiceLineItems invoiceLineItems
  • notes 笔记

Let's say you also want to index details of products, would you create a new document type with a schema as follows; 假设您还要索引产品的详细信息,是否可以使用以下模式创建新的文档类型;

  • productCode 产品代码
  • productDescription 产品描述
  • sellingPrice sellingPrice
  • buyingPrice buyingPrice
  • onHand 手上
  • avgCost avgCost
  • notes 笔记

and create a new core in Solr to index product documents? 并在Solr中创建一个新核心来索引产品文档? Or would you merge both transaction and product into one schema as follows; 或者您将事务和产品合并到一个模式中,如下所示;

  • invoiceDate 发票日期
  • dueDate 截止日期
  • invoiceSummary 帐单摘要
  • billingContact 帐单联系人
  • invoiceLineItems invoiceLineItems
  • productCode 产品代码
  • productDescription 产品描述
  • sellingPrice sellingPrice
  • buyingPrice buyingPrice
  • onHand 手上
  • avgCost avgCost
  • notes 笔记

and have just the one core indexing the above doucment, instead of having an "Invoice" core and a "Product" core indexing the two different documents? 只有一个核心索引上述doucment,而不是有一个“Invoice”核心和一个“产品”核心索引两个不同的文件?

I guess it makes sense to have a single flat index as suggested in the Solr wiki when the fields are similar, however in an example like above, the data are not even remotely related to one another because they are separate entities. 我认为当字段相似时,在Solr wiki中建议使用单个平面索引是有意义的,但是在上面的示例中,数据甚至彼此之间没有远程相关,因为它们是独立的实体。 I have seen cases where people have suggested to add an extra field to distinguish between the different entities, like a table name field or similar, and filter the query based on the table name field, which I guess works. 我见过人们建议添加额外字段以区分不同实体(如表名字段或类似字段)的情况,并根据表名字段过滤查询,我猜这种情况有用。 I am not sure how far that is scalable though when you have a use case as follows; 虽然当你有一个如下用例时,我不确定它的可扩展性有多远;

"Search invoices for key word 'John', fields to search for are 'billingContact', 'invoiceSummary', 'notes'. Boost 'billingContact' field at query time. Also search product for 'John', fields to search for are 'productDescription', 'supplier', 'notes'. Boost 'supplier' at query time. Return only 100 invoices and 100 products." “搜索关键字'John'的发票,要搜索的字段是'billingContact','invoiceSummary','notes'。在查询时提升'billingContact'字段。还搜索产品'John',要搜索的字段是' productDescription','supplier','notes'。在查询时提升'供应商'。仅返回100个发票和100个产品。“

The application I am working on needs to search across invoices and products from a single form. 我正在处理的应用程序需要从单个表单中搜索发票和产品。 There are no different parts in the application that searches for different stuff. 应用程序中没有不同的部分可以搜索不同的内容。

My concerns in putting everything in one index; 我把所有东西放在一个索引中的担忧;

1)Large index size eg: 50 million invoices + 50 million products in single index 1)指数大,例如:5000万张发票+ 5000万单一指数产品

2) Reindexing an index of that size. 2)重新索引该大小的索引。

3) Index tuning: wouldn't it be easier to tweak/tune each separate index to serve specific expected search outcomes, rather than trying to do that in a single index? 3)索引调整:调整/调整每个单独的索引以提供特定的预期搜索结果,而不是尝试在单个索引中执行此操作不是更容易吗?

4) We decide to index billing contact details as well in the future. 4)我们决定将来也会对结算联系方式进行索引。 Which will add more fields to be indexed and contribute to my concerns in points 1) and 2). 这将添加更多要编入索引的字段,并在第1)和第2点中对我的关注做出贡献。

Return only 100 invoices and 100 products. 仅退回100张发票和100件商品。

also

Boost 'billingContact' field at query time Boost 'supplier' at query time 在查询时提升'billingContact'字段在查询时提升'供应商'

This would suggest that even though you are searching the same terms, you are searching them as separate concepts. 这表明即使您正在搜索相同的术语,您也会将它们作为单独的概念进行搜索。

Based on this and lack of common fields, I would recommend starting with separate collections. 基于此和缺乏共同领域,我建议从单独的集合开始。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM