简体   繁体   中英

How to get entities' index in ElasticSearch-Spark

I am using the elasticsearch-hadoop/spark library to create Spark RDD s from ElasticSearch queries.

The esRDD method returns the raw document ( _source , in ElasticSearch terms) and the document's id ( _id in ES), but I also need additional information regarding the returned documents, such as the ElasticSearch index and type each document comes from (this information is always available from the ES REST API).

How can I get the index and type information of documents in the RDD returned by the esRDD method?

EDIT
I am querying multiple indices, ie my call to esRDD looks like this:

sparkContext.esRDD("index*/entities", query)

and the actual indices are "index1", "index2", etc. So, I want to know which specific index each of the entities in the resulting RDD came from.

In case anyone stumbles upon this in the future:

The solution was to set the es.read.metadata setting to true (see here ). This adds a _metadata field to each document in the esRDD , which contains info such as the document's index, type, id, version, etc.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM