简体   繁体   English

Lucene数据复制如何在ElasticSearch和Apache Solr等技术上工作

[英]How Lucene Data Replication Works on Technologies Like ElasticSearch and Apache Solr

In a high-availability environment, how can these technologies replicate Lucene data? 在高可用性环境中,这些技术如何复制Lucene数据? How could I do the replication of my Lucene directories, considering that today I do not use such technologies. 考虑到今天我不使用这种技术,我该如何复制Lucene目录。

That question is probably too wide to answer anything useful, but in general you have two options: 这个问题可能太宽泛,无法回答任何有用的信息,但总的来说,您有两种选择:

  • Index the document to a master node, then replicate the index files that have changed to all other nodes. 将文档索引到主节点,然后将已更改的索引文件复制到所有其他节点。 These are usually known as master/slave setups. 这些通常称为主/从设置。 The first versions of Solr used rsync to do this - that way Solr didn't have to know anything about replication itself. Solr的第一个版本使用rsync来做到这一点-这样Solr不必了解任何有关复制本身的知识。 Later versions used HTTP to replicate the index files instead. 更高版本使用HTTP复制索引文件。 If you already have a Lucene index that you want to make available on more nodes, this is the easiest solution that doesn't require fundamental changes to your project. 如果您已经具有要在更多节点上使用的Lucene索引,则这是最简单的解决方案,不需要对项目进行根本更改。

  • Distribute the document that's going to be added to the index to all known replicas of that index/shard. 将要添加到索引的文档分发到该索引/分片的所有已知副本。 The indexing process happens on each node, and the document is distributed to the node before it has been added to the index. 索引过程在每个节点上进行,并且在文档被添加到索引之前将其分发到该节点。 This is (simplified) what happens when Solr runs in cloud / cluster mode (and is what ES does as well IIRC). 这是(简化的)Solr在云/集群模式下运行时发生的情况(ES以及IIRC也是如此)。 There's also transaction logs etc. involved here to make it more resilient to failure across nodes. 这里还涉及事务日志等,以使其更能抵御跨节点的故障。

So either distribute the updates themselves or distribute the updated index. 因此,要么自己分发更新,要么分发更新的索引。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM