How to get entities' index in ElasticSearch-Spark

Question

I am using the elasticsearch-hadoop/spark library to create Spark RDD s from ElasticSearch queries.

The esRDD method returns the raw document ( _source , in ElasticSearch terms) and the document's id ( _id in ES), but I also need additional information regarding the returned documents, such as the ElasticSearch index and type each document comes from (this information is always available from the ES REST API).

How can I get the index and type information of documents in the RDD returned by the esRDD method?

EDIT
I am querying multiple indices, ie my call to esRDD looks like this:

sparkContext.esRDD("index*/entities", query)

and the actual indices are "index1", "index2", etc. So, I want to know which specific index each of the entities in the resulting RDD came from.

Answer 1

In case anyone stumbles upon this in the future:

The solution was to set the es.read.metadata setting to true (see here ). This adds a _metadata field to each document in the esRDD , which contains info such as the document's index, type, id, version, etc.

How to get entities' index in ElasticSearch-Spark

Question

1 answers

solution1
0 ACCPTED 2017-01-08 21:02:41

How to get entities' index in ElasticSearch-Spark

Question

1 answers

solution1 0 ACCPTED 2017-01-08 21:02:41

solution1
0 ACCPTED 2017-01-08 21:02:41