简体   繁体   中英

How to find delta between two SOLR collections

We are using Lucid works Solr version 4.6.

Our source system basically stores data into two destination systems (one through real time and another thorough the batch mode). Data is ingested into Solr through the real time route.

We need to periodically synch the data ingested in Solr with the data ingested into the batch system.

The design we are currently trying to evaluate is to import the data from batch system into another Solr collection, but really not sure how to sync both collections (ie the one with realtime data and second is through batch import).

I read through data import handlers but this will override the existing data in Solr. Is there any way in which we can identify the delta between the two collections and ingest that only.

There is no good way; there are a couple of things you can do:

  1. When data is coming into the real time system there is a an import timestamp. Then do a range query to pull in the new stuff. I think new versions of Solr already have a field for this.
  2. Log IDs of documents going into the first Solr and then index these.
  3. Separate queue for the other collection

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM