简体   繁体   English

如何在Spark中使用Elasticsearch-Hadoop将数据从一个Elasticsearch集群重新索引到另一个集群

[英]How to reindex data from one Elasticsearch cluster to another with elasticsearch-hadoop in Spark

I have two separated Elasticsearch clusters, I want to reindex the data from the first cluster to the second cluster, but I found that I can only setup one Elasticsearch cluster inside SparkContext configuration, such as: 我有两个单独的Elasticsearch集群,我想将第一个集群中的数据重新索引到第二个集群,但是我发现我只能在SparkContext配置中设置一个Elasticsearch集群,例如:

var sparkConf : SparkConf = new SparkConf()
                     .setAppName("EsReIndex")
sparkConf.set("es.nodes", "node1.cluster1:9200")

So how can I move data between two Elasticsearch clusters with elastic search-hadoop in Spark inside of the same application ? 那么,如何在同一应用程序内部的Spark中使用弹性搜索-hadoop在两个Elasticsearch集群之间移动数据?

You don't need to configure the node address inside the SparkConf for the matter. 您无需为此配置SparkConf内部的节点地址。

When you use your DataFrameWriter with elasticsearch format, you can pass the node address as an option as followed : 当您使用DataFrameWriter与elasticsearch格式,你可以遵循通过该节点的地址作为一个选项:

val df = sqlContext.read
                  .format("elasticsearch")
                  .option("es.nodes", "node1.cluster1:9200")
                  .load("your_index/your_type")

df.write
    .option("es.nodes", "node2.cluster2:9200")
    .save("your_new_index/your_new_type")

This should work with spark 1.6.X and the corresponding elasticsearch-hadoop connector. 这应该与spark 1.6.X和相应的elasticsearch-hadoop连接器一起使用。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 简单esRDD(Spark中使用的Elasticsearch-hadoop连接器)引发了异常 - Exception raised with simple esRDD (elasticsearch-hadoop connector used in Spark) Scala SBT elasticsearch-hadoop 未解决的依赖 - Scala SBT elasticsearch-hadoop unresolved dependency 无法通过Elasticsearch-hadoop库在多重火花节点上的RDD上应用映射 - Fail to apply mapping on an RDD on multipe spark nodes through Elasticsearch-hadoop library sbt使用elasticsearch-hadoop与Spark发生“冲突的跨版本后缀”错误 - sbt “Conflicting cross-version suffixes” error with Spark using elasticsearch-hadoop 无法在Elasticsearch-hadoop中使用SchemaRDD.saveToES()从HDFS索引JSON - Unable to index JSON from HDFS using SchemaRDD.saveToES() in Elasticsearch-hadoop Elasticsearch-Hadoop库无法连接到Docker容器 - Elasticsearch-Hadoop library cannot connect to to docker container 如何在Spark和Elasticsearch中迭代hadoop MapWritable - How to iterate over hadoop MapWritable in Spark & Elasticsearch 如何从Elasticsearch读取数据到Spark? - How to read data from Elasticsearch to Spark? 如何在SPARK中使用elasticsearch-spark从Elasticsearch读取数据时转换类型 - How to convert types when reading data from Elasticsearch using elasticsearch-spark in SPARK 使用 scala 和 spark 3.0.1 从 Elasticsearch 读取数据 - Read data from Elasticsearch with scala and spark 3.0.1
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM