[英]How to reindex data from one Elasticsearch cluster to another with elasticsearch-hadoop in Spark
I have two separated Elasticsearch clusters, I want to reindex the data from the first cluster to the second cluster, but I found that I can only setup one Elasticsearch cluster inside SparkContext configuration, such as: 我有两个单独的Elasticsearch集群,我想将第一个集群中的数据重新索引到第二个集群,但是我发现我只能在SparkContext配置中设置一个Elasticsearch集群,例如:
var sparkConf : SparkConf = new SparkConf()
.setAppName("EsReIndex")
sparkConf.set("es.nodes", "node1.cluster1:9200")
So how can I move data between two Elasticsearch clusters with elastic search-hadoop in Spark inside of the same application ? 那么,如何在同一应用程序内部的Spark中使用弹性搜索-hadoop在两个Elasticsearch集群之间移动数据?
You don't need to configure the node address inside the SparkConf for the matter. 您无需为此配置SparkConf内部的节点地址。
When you use your DataFrameWriter with elasticsearch
format, you can pass the node address as an option as followed : 当您使用DataFrameWriter与
elasticsearch
格式,你可以遵循通过该节点的地址作为一个选项:
val df = sqlContext.read
.format("elasticsearch")
.option("es.nodes", "node1.cluster1:9200")
.load("your_index/your_type")
df.write
.option("es.nodes", "node2.cluster2:9200")
.save("your_new_index/your_new_type")
This should work with spark 1.6.X and the corresponding elasticsearch-hadoop connector. 这应该与spark 1.6.X和相应的elasticsearch-hadoop连接器一起使用。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.