简体   繁体   English

无法通过Elasticsearch-hadoop库在多重火花节点上的RDD上应用映射

[英]Fail to apply mapping on an RDD on multipe spark nodes through Elasticsearch-hadoop library

import org.elasticsearch.spark._
import org.apache.spark.{SparkConf, SparkContext}
import org.apache.spark.serializer._;
import com.esotericsoftware.kryo.Kryo;
import org.elasticsearch.spark.rdd.EsSpark 

sc.stop()

val conf = new SparkConf()
conf.set("es.index.auto.create","true")
conf.set("spark.serializer", classOf[KryoSerializer].getName)

conf.set("es.nodes","localhost")
val sc = new SparkContext(conf)

val getAllQuery = "{\"query\":{\"match_all\":{}}}"
val esRDDAll = sc.esRDD("test-index/typeA", getAllQuery)

//WORKS
esRDDAll.count

//WORKS
EsSpark.saveToEs(esRDDAll, "output-index/typeB")

val esRDDMap = esRDDAll.map(r => r)

//FAILS
esRDDMap.count

//FAILS
EsSpark.saveToEs(esRDDMap, "output-index/typeB")

The error I am getting is: 我得到的错误是:

WARN TaskSetManager: Lost task 0.0 in stage 4.0 (TID 41, localhost): java.lang.ClassNotFoundException: $line594.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$anonfun$1
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        at java.lang.Class.forName0(Native Method)
        at java.lang.Class.forName(Class.java:348)
        at org.apache.spark.serializer.JavaDeserializationStream$$anon$1.resolveClass(JavaSerializer.scala:68)
        at java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1613)
        at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1518)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1774)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2000)
        at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1924)
        at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1801)
        at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1351)
        at java.io.ObjectInputStream.readObject(ObjectInputStream.java:371)
        at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:76)
        at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:115)
        at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
        at org.apache.spark.scheduler.Task.run(Task.scala:89)
        at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:213)
        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
        at java.lang.Thread.run(Thread.java:745)

Note 注意

This only occurs when I am using master slave mode in Spark. 这只发生在我在Spark中使用主从模式时。 On a single node it works fine. 在单个节点上它工作正常。

I have encountered similar issues, and I hope those two links might help : 我遇到过类似的问题,我希望这两个链接可能会有所帮助:

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 简单esRDD(Spark中使用的Elasticsearch-hadoop连接器)引发了异常 - Exception raised with simple esRDD (elasticsearch-hadoop connector used in Spark) Elasticsearch-Hadoop库无法连接到Docker容器 - Elasticsearch-Hadoop library cannot connect to to docker container 如何在Spark中使用Elasticsearch-Hadoop将数据从一个Elasticsearch集群重新索引到另一个集群 - How to reindex data from one Elasticsearch cluster to another with elasticsearch-hadoop in Spark sbt使用elasticsearch-hadoop与Spark发生“冲突的跨版本后缀”错误 - sbt “Conflicting cross-version suffixes” error with Spark using elasticsearch-hadoop Scala SBT elasticsearch-hadoop 未解决的依赖 - Scala SBT elasticsearch-hadoop unresolved dependency Spark RDD映射问题 - Spark RDD mapping questions Spark中RDD的映射方法 - Mapping method to RDD in Spark 无法在Elasticsearch-hadoop中使用SchemaRDD.saveToES()从HDFS索引JSON - Unable to index JSON from HDFS using SchemaRDD.saveToES() in Elasticsearch-hadoop 如何取消Spark Hadoop RDD计算 - How to cancel Spark Hadoop RDD computation 当将带有Java语音匹配库的rdd映射到null值时,Spark会引发java.lang.NullPointerException - Spark throws java.lang.NullPointerException when mapping rdd with java phonetic matching library on null values
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM