How to read from multiple Elasticsearch indices in Spark?

Question

I need to read data from multiple indices of Elasticsearch. But all of these indices have the same data structure.

For example:

val df1 = spark.read.format("org.elasticsearch.spark.sql")
              .option("query", myquery)
              .option("pushdown", "true")
              .load("news_01/myitem")

val df2 = spark.read.format("org.elasticsearch.spark.sql")
              .option("query", myquery)
              .option("pushdown", "true")
              .load("news_02/myitem")

What happens if I get the array of index names ["news_01", "news_02"] ?

How can I avoid creating df1 , df2 as I do now?

Answer 1

Given that ElasticSearch allows you to target multiple indices at the same time during a search request, you could do something like:

val df = spark.read.format("org.elasticsearch.spark.sql")
              .option("query", myquery)
              .option("pushdown", "true")
              .load("news_01,news_02")

How to read from multiple Elasticsearch indices in Spark?

Question

1 answers

solution1
1 ACCPTED 2018-04-25 05:01:44

How to read from multiple Elasticsearch indices in Spark?

Question

1 answers

solution1 1 ACCPTED 2018-04-25 05:01:44

solution1
1 ACCPTED 2018-04-25 05:01:44