简体   繁体   English

Solr数据导入处理程序(DIH) - MySQL导入性能

[英]Solr Data Import Handler (DIH) - MySQL Import Performance

I have a MySQL database with 4 million products, which I am importing into Solr using DIH so that I may perform elaborate searches. 我有一个包含400万个产品的MySQL数据库,我使用DIH将其导入Solr,以便我可以进行精心搜索。 However the data relationships mean that I actually request a lot more than four million records (eg one product may have many colours, etc), and it takes over 8 hours to build the index. 然而,数据关系意味着我实际上要求超过四百万条记录(例如,一种产品可能有许多颜色等),并且构建索引需要8个多小时。

Is there a way to improve the performance of the indexing without using delta-queries? 有没有办法在不使用delta-queries的情况下提高索引的性能? For example, is the performance bottleneck due to the multiple "join" conditions I am using? 例如,由于我使用的多个“加入”条件,性能瓶颈是什么? There are no indexing performance statistics available in Solr that I can see so it is very hard to diagnose where the performance bottleneck is. Solr中没有可用的索引性能统计数据,因此很难诊断出性能瓶颈在哪里。

This is my data-config.xml file: 这是我的data-config.xml文件:

Thanks, 谢谢,

<document>
    <entity name="A" pk="id" query="SELECT id AS id_productByStore, id_product, id_store, ... FROM A">
        <entity name="B" pk="id" query="SELECT id, cleanTitle, id_brand, ... FROM B WHERE id='${A.id_product}'">
            <entity name="C" pk="id" query="SELECT name, alias FROM C WHERE id ='${B.id_brand}'"></entity>
            <entity name="D" pk="id" query="SELECT name FROM D WHERE id ='${B.id_category}'"></entity>
            <entity name="E" pk="id" query="SELECT gender FROM E WHERE id='${B.id_gender}'" > </entity>
            <entity name="F" pk="id" query="SELECT id_colour FROM F WHERE id_colourSet='${B.id_colourSet}'">
                <entity name="G" pk="id" query="SELECT title FROM G WHERE id='${F.id_colour}'" > </entity>
            </entity>
        </entity>
        <entity name="H" pk="id" query="SELECT name FROM H WHERE id = '${A.id_store}'"></entity>
    </entity>
</document>

If your MySQL DB and Solr server are not on the same machine, you could have a network issue on your hands. 如果您的MySQL DB和Solr服务器不在同一台计算机上,您可能会遇到网络问题。 The DB and Solr server at my shop aren't on the same machine and sometimes imports slow down by a lot, depending on what's going on that day. 我店里的DB和Solr服务器不在同一台机器上,有时导入速度会慢很多,具体取决于当天的情况。

The thing that is probably your biggest contributor is your nested entities. 可能是你最大的贡献者是你的嵌套实体。 When Solr imports documents, it appears that Solr behaves as if nested entities are nested loops. 当Solr导入文档时,似乎Solr的行为就像嵌套实体是嵌套循环一样。 You would probably be much better off if you could use a series of inner or right joins to bring your columns together in one query. 你可能会被关闭,如果你能使用一系列的内部更好的或右连接,使您一起列在一个查询。

We used to use nested entities where I work, and imports could take hours. 我们曾经在我工作的地方使用嵌套实体,导入可能需要数小时。 We were able to write a fairly complex MySQL join to replace those nested entities. 我们能够编写一个相当复杂的MySQL连接来替换那些嵌套的实体。 Our full imports typically are around the 10 - 15 minute range, and we're pulling in about 3 or 4 million records. 我们的全部进口通常在10到15分钟的范围内,我们正在提供大约3到4百万条记录。 Deltas are in the 5 - 10 minute range. Deltas在5到10分钟范围内。 Even if you can't join all your columns, joining as many as possible and using nested entities for the rest should help your indexing time. 即使您无法加入所有列,尽可能多地加入并使用嵌套实体进行其余操作也应该有助于您的索引编制时间。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM