简体   繁体   English

在Solr 5.1.0中处理MySQL的多个表吗?

[英]Handle multiple tables of MySQL in with Solr 5.1.0?

I have more than 30 tables in my MySQL database. 我的MySQL数据库中有30多个表。 Recently I have import data from my 1 table to Solr 5.1.0 using DataImporthandler and in my data-config.xml file, fire query of, 最近,我已使用DataImporthandler将数据从1表导入到Solr 5.1.0,并在我的data-config.xml文件中对以下内容进行了查询:

select * from table-name

But in my search I have to integrate more than 10 tables to give proper search result. 但是在搜索中,我必须集成10多个表才能给出正确的搜索结果。

The ways are to do this is 做到这一点的方法是

1) To import data by using JOIN query in MySQL database and import it 1)使用MySQL数据库中的JOIN查询导入数据并导入

OR 要么

2) JOIN solr cores by importing full data separate tables. 2)通过导入完整数据单独的表来加入 Solr核心。

What shoud I do make it optimize?? 我应该做些什么使其优化? and which is a good way? 那是个好方法?

If you have a single core then i would recommend importing tables into one single core and using joins.That is what i have done on my solr 4.9 with cake php and solrphpclient. 如果您只有一个核心,那么我建议将表导入一个核心并使用joins。那是我在Solr 4.9上使用Cake php和solrphpclient完成的工作。 But for this you will have to define table structure and data types in data-config.xml and schema.xml.Which i assume you must have done. 但是为此,您必须在data-config.xml和schema.xml中定义表结构和数据类型。我认为您必须已完成。 In you data-config file you write queries or define a structure which will import all the data from your ten tables accordingly 在数据配置文件中,您编写查询或定义一个结构,该结构将相应地从十个表中导入所有数据

See my example for two tables 参见我的两个表示例

 <entity name="type_masters" pk="type_id" query="SELECT delete_status as   
 type_masters_delete_status,type_updated,type_id,category_id,type_name FROM   
 type_masters
where type_id='${businessmasters.Business_Type}'"
deltaQuery="select type_id from type_masters where type_updated > 
'${dih.last_index_time}'"
parentDeltaQuery="select business_id from businessmasters where 
Business_Type=${type_masters.type_id}"> 
 <field column="type_id" name="id"/>   
 <field column="category_id" name="category_id" indexed="true" stored="true"   
/>
  <field column="type_name" name="type_name" indexed="true" stored="true" />

       <field column="type_updated" name="type_updated" indexed="true" 
stored="true" />
<field column="type_masters_delete_status" name="type_masters_delete_status" 
indexed="true" stored="true" />


<entity name="category_masters" query="SELECT delete_status as 
category_masters_delete_status,category_updated,category_id,category_name 
FROM category_masters where category_id='${type_masters.category_id}'"

   deltaQuery="select category_id from category_masters where category_updated > '${dih.last_index_time}'"

  parentDeltaQuery="select type_id from type_masters where 
  category_id=${category_masters.category_id}"> 

   <field column="category_id" name="id"/>   

  <field column="category_name" name="category_name" indexed="true"    
    stored="true" />
    <field column="category_updated" name="category_updated" indexed="true" 
   stored="true" />
             <field column="category_masters_delete_status" 
     name="category_masters_delete_status" indexed="true" stored="true" />
           </entity><!-- category_masters -->

      </entity><!-- type_masters -->
  1. To import data by using JOIN query in MySQL database and import it 通过在MySQL数据库中使用JOIN查询导入数据并将其导入

    Yes, this is achievable in solr using DIH. 是的,使用DIH可以在solr中实现。 With the DIH, as you have to configure your data-config.xml. 使用DIH,您必须配置data-config.xml。 Here you can write the query using the joins which will fetch the data from all the desired table. 在这里,您可以使用联接编写查询,该联接将从所有所需表中获取数据。 Here you can create a single core and can have all the data in the single core. 在这里,您可以创建一个核心,并且可以将所有数据包含在单个核心中。 You can create your document using those field. 您可以使用这些字段创建文档。 (Documents fields will be mentioned in schema.xml). (文档字段将在schema.xml中提及)。

    Points to consider here for the optimization would be what all fields you want to search on and wanted to show in the result. 此处要考虑的优化点是您要搜索并希望在结果中显示的所有字段。 So you need to sort of this first. 因此,您需要先进行排序。 Which on fields will you search on and need to displayed. 您将搜索哪些字段并需要显示。

    The fields on which you need search make them as indexed="true". 您需要搜索的字段使它们成为indexed =“ true”。 Rest all make as indexed="false". 其余所有make as indexed =“ false”。 The fields which you need in the result mark them as stored="true". 结果中需要的字段将其标记为stored =“ true”。 Rest all make as stored="false". 其余所有内容都存储为=“ false”。

    Some may be require as both, like search and show in result. 可能同时需要某些内容,例如搜索和显示结果。 Mark them as indexed="true" and stored="true". 将它们标记为indexed =“ true”和storage =“ true”。

    for example I had 15 fields in my document but only 4 are indexed, as I want to search only on those fields. 例如,我的文档中有15个字段,但只有4个被索引,因为我只想搜索这些字段。 and rest all fields are shown in the result so there are stored. 其余的所有字段都显示在结果中,以便进行存储。

    Now coming to your second question 现在来第二个问题

    JOIN solr cores by importing full data separate tables. 通过导入完整数据单独的表来加入Solr核心。 Yes this is possible in solr since solr 4.0 是的,因为solr 4.0,这在solr中是可能的

    for a detailed example check the below link https://wiki.apache.org/solr/Join 有关详细示例,请检查以下链接https://wiki.apache.org/solr/Join

    But also condider the limitations of it. 还要考虑它的局限性。

  2. Fields or other properties of the documents being joined "from" are not available for use in processing of the resulting set of "to" documents (ie: you can not return fields in the "from" documents as if they were a multivalued field on the "to" documents). 无法将正在“从”联接的文档的字段或其他属性用于处理“到”文档的结果集(即:您不能像在“自”文档中返回多值字段一样返回字段。 “收件人”文件)。

    So you can consider these points before you take a final call. 因此,在进行最终通话之前,您可以考虑这些要点。

Consider here you have two cores 考虑这里有两个核心

core brands with fields {id,name}
core products with fields{id, name, brand_id}

data in core BRANDS: {1, Apple}, {2, Samsung}, {3, HTC}

data in core PRODUCTS: {1, iPhone, 1}, {2, iPad, 1}, {3, Galaxy S3, 2}, {4, Galaxy Note, 2}, {5, One X, 3}

you would build your query like : 您将像建立您的查询:

http://example.com:8999/solr/brands/select?q=*:*&fq={!join from=brand_id to=id fromIndex=products}name:iPad

and the Result will be: {id: "1", name:"Apple"}
  1. In a DistributedSearch environment, you can not Join across cores on multiple nodes. 在DistributedSearch环境中,您不能跨多个节点上的核心加入。 If however you have a custom sharding approach, you could join across cores on the same node. 但是,如果您具有自定义分片方法,则可以跨同一节点上的核心进行联接。

  2. The Join query produces constant scores for all documents that match -- scores computed by the nested query for the "from" documents are not available to use in scoring the "to" documents. Join查询为所有匹配的文档产生恒定的分数-嵌套查询为“ from”文档计算的分数不可用于对“ to”文档进行评分。

    Considering the above points I hope you can decide on which approach you want to take. 考虑到以上几点,我希望您可以决定要采用的方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM