简体   繁体   中英

Posting large directory of files to SOLR using post tool, how to commit after every file

I am using the java post tool for solr to upload and index a directory of documents. There are several thousand documents. Solr only does a commit at the very end of the process and sometimes things stop before it completes so I lose all the work.

Has anyone a technique to fetch the name of each doc and call post on that so you get the commit for each document? Rather than the large commit of all the docs at the end?

From the help page for the post tool:

Other options:
  ..
  -params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request)

This should allow you to use -params "commitWithin=1000" to make sure each document shows up within one second of being added to the index.

Committing after each document is an overkill for the performance , in any case it's quite strange that you had to resubmit anything from start if something goes wrong. I suggest to seriously to change the indexing strategy you're using instead of investigating in a different way to commit.

Given that, if you not have any other way that change the commit configuration, I suggest to configure autocommit in your Solr collection/index or use the parameter commitWithin , as suggested by @MatsLindh. Just be aware if the tool you're using has the chance to add this parameter.

autoCommit

These settings control how often pending updates will be automatically pushed to the index. An alternative to autoCommit is to use commitWithin, which can be defined when making the update request to Solr (ie, when pushing documents), or in an update RequestHandler.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM