简体   繁体   English

使用post工具将大文件目录发布到SOLR,如何在每个文件后提交

[英]Posting large directory of files to SOLR using post tool, how to commit after every file

I am using the java post tool for solr to upload and index a directory of documents. 我正在使用java post工具进行solr上传和索引文档目录。 There are several thousand documents. 有几千个文件。 Solr only does a commit at the very end of the process and sometimes things stop before it completes so I lose all the work. Solr只在进程的最后进行提交,有时事情在完成之前停止,所以我失去了所有的工作。

Has anyone a technique to fetch the name of each doc and call post on that so you get the commit for each document? 有没有人可以获取每个文档的名称并在其上调用帖子,以便获得每个文档的提交? Rather than the large commit of all the docs at the end? 而不是最后提交所有文档?

From the help page for the post tool: post工具的帮助页面:

Other options:
  ..
  -params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request)

This should allow you to use -params "commitWithin=1000" to make sure each document shows up within one second of being added to the index. 这应该允许您使用-params "commitWithin=1000"来确保每个文档在添加到索引的一秒钟内显示。

Committing after each document is an overkill for the performance , in any case it's quite strange that you had to resubmit anything from start if something goes wrong. 在每个文档之后提交对于性能来说是一种过度杀伤 ,无论如何,如果出现问题,你必须从头开始重新提交任何内容,这是很奇怪的。 I suggest to seriously to change the indexing strategy you're using instead of investigating in a different way to commit. 我建议认真改变您正在使用的索引策略,而不是以不同的方式进行调查。

Given that, if you not have any other way that change the commit configuration, I suggest to configure autocommit in your Solr collection/index or use the parameter commitWithin , as suggested by @MatsLindh. 鉴于此,如果您没有任何其他方式来更改提交配置,我建议您在Solr集合/索引中配置autocommit或使用参数commitWithin ,如commitWithin所建议的那样。 Just be aware if the tool you're using has the chance to add this parameter. 请注意,您使用的工具是否有机会添加此参数。

autoCommit 自动提交

These settings control how often pending updates will be automatically pushed to the index. 这些设置控制挂起的更新自动推送到索引的频率。 An alternative to autoCommit is to use commitWithin, which can be defined when making the update request to Solr (ie, when pushing documents), or in an update RequestHandler. autoCommit的替代方法是使用commitWithin,可以在向Solr发出更新请求时(即,在推送文档时)或在更新RequestHandler中定义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM