简体   繁体   English

Solr架构设计

[英]Solr Schema Design

I have some questions regarding the solr schema design. 我对Solr模式设计有一些疑问。 Basically I'm setting up a search engine for product catalogue website and my table relationships are as follows. 基本上,我正在为产品目录网站设置搜索引擎,并且我的表关系如下。

  • Product Belongs to Merchant Product属于Merchant
  • Product Belongs to Brand Product属于Brand
  • Product has and belongs to many Categories Product具有并属于许多Categories
  • Category has many Sub Categories Category有许多Sub Categories
  • Sub Category has many Types Sub Category有很多Types
  • Type has many Sub Types Type有很多Sub Types

So far my Schema.xml is looks like this. 到目前为止,我的Schema.xml看起来像这样。

<field name="product_id" type="string" indexed="true" stored="true" required="true" /> 
<field name="name" type="string" indexed="true" stored="true"/>
<field name="merchant" type="string" indexed="true" stored="true"/>
<field name="merchant_id" type="string" indexed="true" stored="true"/>
<field name="brand" type="string" indexed="true" stored="true"/>
<field name="brand_id" type="string" indexed="true" stored="true"/>
<field name="categories" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="sub_categories" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="types" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="sub_types" type="string" multiValued="true" indexed="true" stored="true"/>
<field name="price" type="float" indexed="true" stored="true"/>
<field name="description" type="text" indexed="true" stored="true"/>
<field name="image" type="text" indexed="true" stored="true"/>

<field name="text" type="text" indexed="true" stored="false" multiValued="true"/>

<uniqueKey>product_id</uniqueKey>

<defaultSearchField>text</defaultSearchField>

<solrQueryParser defaultOperator="OR"/>

<copyField source="name" dest="text"/>
<copyField source="merchant" dest="text"/>
<copyField source="brand" dest="text"/>
<copyField source="categories" dest="text"/>
<copyField source="sub_categories" dest="text"/>
<copyField source="types" dest="text"/>
<copyField source="sub_types" dest="text"/>

So my Questions now: 所以我现在的问题是:

1) Is the Schema correct? 1)模式是否正确?

2) Let's assume I need to find products for Category XYZ . 2)假设我需要找到Category XYZ产品。 My Senior programer doesn't like querying the solr by Category Name , instead he wan't to use CategoryID . 我的高级程序员不喜欢按Category Name查询solr,而是不使用CategoryID He is suggesting to store CategoryID_CategoryName (1001_Category XYZ) and from web front he is sending ID. 他建议存储CategoryID_CategoryName (1001_Category XYZ)并从Web前端发送ID。 (Assuming that Names with white spaces doesn't work properly). (假设带空格的名称无法正常工作)。

So to find the products I should then do a partial match of categories and identify the category id from the string ie (fetch 1001 from 1001_Category XYZ) or What if I keep the Names on categories field and setup another field for category_ids ? 因此,要查找产品,我应该对categories进行部分匹配,然后从字符串中识别类别ID,例如(fetch 1001 from 1001_Category XYZ)或者如果我保留“ categories名称”字段并为category_ids设置另一个字段怎么办? that's seems a better option for me. 对我来说,这似乎是一个更好的选择。

or 要么

is there any Solr multi valued field type to store CategoryID and CategoryName together? 是否有任何Solr多值字段类型可以将CategoryIDCategoryName一起存储?

Let me know your thoughts, thanks. 让我知道您的想法,谢谢。

Answers to your questions. 回答您的问题。

  1. Maybe - it depends on how you plan on structuring your queries, what you intend to search and what you intend to retrieve in search results. 也许-这取决于您计划如何构造查询,要搜索的内容以及要在搜索结果中检索的内容。 In your schema, you're storing & indexing everything which can be quite inefficient. 在您的模式中,您正在存储和索引所有效率不高的内容。 Index what you intend to query, store what you intend to retrieve/display. 索引要查询的内容,存储要检索/显示的内容。 If you were looking for optimizations, I would review the datatypes used in the schema - try to stay as native to the source type as you can. 如果您正在寻找优化,我将查看模式中使用的数据类型-尝试尽可能地保持原始类型不变。
  2. Querying by CategoryId - your programmer is correct, you want to query by category Id. 按CategoryId查询-您的程序员是正确的,您想按类别ID查询。 Your approach of storing Ids and Names in separate fields is accurate as well. 您将ID和名称存储在单独字段中的方法也很准确。 Presuming your Id-based fields are integers/longs, you don't want to structure them as strings but rather as integers/longs. 假设基于Id的字段是整数/整数,则您不想将它们构造为字符串,而是将其构造为整数/整数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM