简体   繁体   中英

Solr near real time search: impact of reindexing frequently the same documents

We want to use SolR in a Near Real Time scenario. Say for example we want to filter / rank our results by number of views.

SolR SoftCommit was made for this use case but:

  • In practice, the same few documents are updated very frequently (just for the nb_view field) while most of the documents are untouched.
  • As far as I know each update, even partial are implemented as a full delete and full addition of the document in lucene.

It seems to me having many times the same docs in the Tlog is inefficient and might also be problematic during the merge process (is the doc marked n times as deleted and added?)

Any advice / good practice?

Two things you could use for supporting this scenario:

  1. In place updates : only that field is udpated, not the whole doc. Check out the conditions you need to be able to use them.
  2. ExternalFileFieldType you keep the values in an external file

if the scenario is critical, I would test both in reald world conditions if possible, and asses.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM