简体   繁体   中英

How Lucene Data Replication Works on Technologies Like ElasticSearch and Apache Solr

In a high-availability environment, how can these technologies replicate Lucene data? How could I do the replication of my Lucene directories, considering that today I do not use such technologies.

That question is probably too wide to answer anything useful, but in general you have two options:

  • Index the document to a master node, then replicate the index files that have changed to all other nodes. These are usually known as master/slave setups. The first versions of Solr used rsync to do this - that way Solr didn't have to know anything about replication itself. Later versions used HTTP to replicate the index files instead. If you already have a Lucene index that you want to make available on more nodes, this is the easiest solution that doesn't require fundamental changes to your project.

  • Distribute the document that's going to be added to the index to all known replicas of that index/shard. The indexing process happens on each node, and the document is distributed to the node before it has been added to the index. This is (simplified) what happens when Solr runs in cloud / cluster mode (and is what ES does as well IIRC). There's also transaction logs etc. involved here to make it more resilient to failure across nodes.

So either distribute the updates themselves or distribute the updated index.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM