简体   繁体   English

Solr-有没有办法加快我的导入

[英]Solr - Is there a way to speed up my import

I have a relational database model This is the basics of my data-config.xml 我有一个关系数据库模型,这是我的data-config.xml的基础

<entity name="MyMainEntity" pk="pID" query="select ... from [dbo].[TableA] inner join TableB on ...">
    <entity name="Entity1" pk="Id1" query="SELECT [Text] Tag from [Table2] where ResourceId = '${MyMainEntity.pId}'"></entity>
            <entity name="Entity1" pk="Id2" query="SELECT [Text] Tag from [Table2] where ResourceId2 = '${MyMainEntity.pId}'"></entity>
    <entity name="LibraryItem" pk="ResourceId" 
            query="select SKU
                    FROM [TableB] 
                    INNER JOIN ...
                    ON ...
                    INNER JOIN ...
                    ON ...
                    WHERE ... AND ...'">
    </entity>
</entity>

Now, this takes a lot of time. 现在,这需要很多时间。
10000 rows in the first query and then each other inner entities are fetched later (around 10 rows each). 在第一个查询中有10000行,然后在彼此之间互相取回内部实体(每个约10行)。

If I use a db profiler I see a the three inner entities query running over and over (3 select sentences than again 3 select sentences over and over) 如果我使用数据库探查器,我会看到三个内部实体查询一遍又一遍地运行(3个选择句子而不是3个选择句子)
This is really not efficient. 这确实没有效率。
And the import can run over 40 hrs () 导入可以运行40多个小时()
Now, 现在,
What are my options to run it faster . 我有什么选择可以更快地运行它。

  1. Obviously there is an option to flat the tables to one big table - but that will create a lot of other side effects. 显然,可以选择将表平整为一个大表-但这会产生很多其他副作用。 I would really like to avoid that extra effort and run solr on my production relational tables. 我真的很想避免这些额外的工作,并在我的生产关系表上运行solr。
    So far it works great out of the box and I am searching here if there is a configuration tweak. 到目前为止,它开箱即用,效果很好,我在这里搜索是否有配置调整。
  2. If I will flat the rows that - does the schema.xml need to be change too? 如果我将这些行放平,那么是否也需要更改schema.xml? or the same fields that are multivalued will keep being multivalued. 或相同的多值字段将保持多值。

Thanks. 谢谢。

without changing the schema of the DB, the first thing to try is caching . 在不更改数据库架构的情况下,首先要尝试的是缓存 If the inner entities cache well, gains will be substantial. 如果内部实体缓存良好,收益将是可观的。

Maybe the wiki is not uptodate so you should check the jira issues, namely solr-2382 and maybe have a look at solr-2948 too. 也许维基不是最新的,所以您应该检查jira问题,即solr-2382 ,也可以看看solr-2948

A second path could be trying multithreading DIH, but it's more tricky. 第二条路径可能是尝试多线程DIH,但这比较棘手。 At one point this was optional, but later was removed cause it was buggy, and I think now there was some jira issue trying to reimplement it, try look it up, but I recommend caching first. 有时这是可选的,但后来由于存在错误而被删除,我认为现在有一些jira问题试图重新实现它,尝试查找它,但是我建议先进行缓存。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM