如何在Spark中使用Elasticsearch-Hadoop将数据从一个Elasticsearch集群重新索引到另一个集群

Question

I have two separated Elasticsearch clusters, I want to reindex the data from the first cluster to the second cluster, but I found that I can only setup one Elasticsearch cluster inside SparkContext configuration, such as: 我有两个单独的Elasticsearch集群，我想将第一个集群中的数据重新索引到第二个集群，但是我发现我只能在SparkContext配置中设置一个Elasticsearch集群，例如：

var sparkConf : SparkConf = new SparkConf()
                     .setAppName("EsReIndex")
sparkConf.set("es.nodes", "node1.cluster1:9200")

So how can I move data between two Elasticsearch clusters with elastic search-hadoop in Spark inside of the same application ? 那么，如何在同一应用程序内部的Spark中使用弹性搜索-hadoop在两个Elasticsearch集群之间移动数据？

Answer 1

You don't need to configure the node address inside the SparkConf for the matter. 您无需为此配置SparkConf内部的节点地址。

When you use your DataFrameWriter with elasticsearch format, you can pass the node address as an option as followed : 当您使用DataFrameWriter与elasticsearch格式，你可以遵循通过该节点的地址作为一个选项：

val df = sqlContext.read
                  .format("elasticsearch")
                  .option("es.nodes", "node1.cluster1:9200")
                  .load("your_index/your_type")

df.write
    .option("es.nodes", "node2.cluster2:9200")
    .save("your_new_index/your_new_type")

This should work with spark 1.6.X and the corresponding elasticsearch-hadoop connector. 这应该与spark 1.6.X和相应的elasticsearch-hadoop连接器一起使用。

如何在Spark中使用Elasticsearch-Hadoop将数据从一个Elasticsearch集群重新索引到另一个集群

问题描述

1 个解决方案

解决方案1
3 已采纳 2016-10-29 08:03:26

如何在Spark中使用Elasticsearch-Hadoop将数据从一个Elasticsearch集群重新索引到另一个集群

问题描述

1 个解决方案

解决方案1 3 已采纳 2016-10-29 08:03:26

解决方案1
3 已采纳 2016-10-29 08:03:26