简体   繁体   English

Solr在几个较小的块中完全导入

[英]Solr full-import in several smaller chunks

I'm trying to import a big MySQL database into Solr, and the import queries are quite heavy on the server (this might affect the actual product which is running and using the database at that time). 我试图将一个大型MySQL数据库导入Solr,并且服务器上的导入查询非常繁琐(这可能会影响当时正在运行和使用该数据库的实际产品)。 Is there a way to split the full import into several smaller chunks? 有没有办法将完全导入分成几个较小的块? I didn't find anything on this subject neither here or in Solr's documentation. 无论是在这里还是在Solr的文档中,我都没有找到关于此主题的任何内容。

I know about the delta import feature, but I'm using it for delta imports of new/changed data. 我知道增量导入功能,但是我将其用于新数据/更改数据的增量导入。

Of course, you can add a condition like 当然,您可以添加条件

WHERE pk<'${dataimporter.request.INDEX}'

and pass INDEX in the request params. 并在请求参数中传递INDEX。 So each time you call full import only part of the records are indexed. 因此,每次调用完全导入时,仅对部分记录进行索引。 Remember to use &clean=false of course or contents will be wiped out each time. 请记住当然要使用&clean = false ,否则每次都会清除掉内容。

Probably you can check batchsize :- 也许你可以检查批处理大小

batchSize (default: 500) – sets the maximum number (or rather a suggestion for the driver) records retrieved from the database in one query to the database. batchSize(默认值:500)–设置在一次查询中从数据库中检索到数据库的最大记录数(或对驱动程序的建议)。 Changing this parameter can help in situations where queries return to much results. 更改此参数可以在查询返回大量结果的情况下提供帮助。 It may not help, since implementation of this mechanism depends on the JDBC driver. 可能没有帮助,因为此机制的实现取决于JDBC驱动程序。

http://lucene.472066.n3.nabble.com/DataImportHandler-running-out-of-memory-td490797.html http://lucene.472066.n3.nabble.com/DataImportHandler-running-out-of-memory-td490797.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM