简体   繁体   中英

SparkContext object has no attribute esRDD (elasticsearch-spark connector)

In the spark-shell, I used with success the elasticsearch-hadoop connector (specifically the one developped for spark : elasticsearch-spark-20_2.11-5.1.2.jar). Invoking spark:

] $SPARK_HOME/bin/spark-shell --master local[2] --jars ~/spark/jars/elasticsearch-spark-20_2.11-5.1.2.jar

In the scala shell:

scala> import org.elasticsearch.spark._
scala> val es_rdd = sc.esRDD("myindex/mytype",query="myquery")

It perfectly works. I want to do the same with pyspark. I tried:

] $SPARK_HOME/bin/pyspark --master local[2] --driver-class-path=/home/pat/spark/jars/elasticsearch-spark-20_2.11-5.1.2.jar

but in the python shell, call to esRDD method is not possible:

>>> sc.esRDD
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  AttributeError: 'SparkContext' object has no attribute 'esRDD'

jar library was loaded because this call works:

>>> conf = {"es.resource" : "myindex/mytype", "es.nodes" : "localhost"}
>>> rdd = sc.newAPIHadoopRDD("org.elasticsearch.hadoop.mr.EsInputFormat","org.apache.hadoop.io.NullWritable","org.elasticsearch.hadoop.mr.LinkedMapWritable", conf=conf)

Does someone know how to use esRDD() in pyspark?

esRDD doesn't exist in pyspark actually.

Thus it will only work in spark scala and you need to import the following :

import org.apache.spark.SparkContext._

import org.elasticsearch.spark._ 

Now you can read data :

val rdd = sc.esRDD("index_name/doc_type")

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM