简体   繁体   English

在Scala中将Scala.collection.immutable.List的类强制转换为异常到scala.collection.Seq

[英]Getting class cast exception in spark ml for scala.collection.immutable.List to scala.collection.Seq

I am getting below given exception when I am trying to train linear regression model (However same thing was getting executed properly when I am using separate JVM to train the model): 当我尝试训练线性回归模型时,我低于给定的异常(但是当我使用单独的JVM训练模型时,同一件事正在正确执行):

org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 13.0 failed 4 times, most recent failure: Lost task 0.3 in stage 13.0 (TID 28, impetus-dsrv07.impetus.co.in, executor 2): java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
Driver stacktrace:
    at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAndIndependentStages(DAGScheduler.scala:1499)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1487)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1486)
    at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
    at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1486)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
    at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.scala:814)
    at scala.Option.foreach(Option.scala:257)
    at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1714)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1669)
    at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1658)
    at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
    at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2022)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2043)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:2062)
    at org.apache.spark.sql.execution.SparkPlan.executeTake(SparkPlan.scala:336)
    at org.apache.spark.sql.execution.CollectLimitExec.executeCollect(limit.scala:38)
    at org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:2853)
    at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2153)
    at org.apache.spark.sql.Dataset$$anonfun$head$1.apply(Dataset.scala:2153)
    at org.apache.spark.sql.Dataset$$anonfun$55.apply(Dataset.scala:2837)
    at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:65)
    at org.apache.spark.sql.Dataset.withAction(Dataset.scala:2836)
    at org.apache.spark.sql.Dataset.head(Dataset.scala:2153)
    at org.apache.spark.sql.Dataset.head(Dataset.scala:2160)
    at org.apache.spark.sql.Dataset.first(Dataset.scala:2167)
    at org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:198)
    at org.apache.spark.ml.regression.LinearRegression.train(LinearRegression.scala:76)
    at org.apache.spark.ml.Predictor.fit(Predictor.scala:118)
    at com.impetus.idw.turin.spark2.ml.algo.LRTrainer.trainLR(LRTrainer.java:88)
    at com.impetus.idw.turin.spark2.ml.algo.LRTrainer.processLRTraining(LRTrainer.java:83)
    at com.impetus.idw.turin.spark2.ml.algo.LRTrainer.execute(LRTrainer.java:54)
    at com.impetus.idw.turin.core.Sequence.runSequence(Sequence.java:122)
    at com.impetus.idw.turin.core.Status.runStatus(Status.java:93)
    at com.impetus.idw.turin.core.Action.runAction(Action.java:83)
    at com.impetus.idw.turin.core.Node.runNode(Node.java:156)
    at com.impetus.idw.turin.core.Node.run(Node.java:96)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List$SerializationProxy to field org.apache.spark.rdd.RDD.org$apache$spark$rdd$RDD$$dependencies_ of type scala.collection.Seq in instance of org.apache.spark.rdd.MapPartitionsRDD
    at java.io.ObjectStreamClass$FieldReflector.setObjFieldValues(ObjectStreamClass.java:2133)
    at java.io.ObjectStreamClass.setObjFieldValues(ObjectStreamClass.java:1305)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2251)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:2245)
    at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:2169)
    at java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:2027)
    at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1535)
    at java.io.ObjectInputStream.readObject(ObjectInputStream.java:422)
    at org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:75)
    at org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:114)
    at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:80)
    at org.apache.spark.scheduler.Task.run(Task.scala:108)
    at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
    ... 3 common frames omitted

I am using below code to create dataset for training: Here, inputDS is a CSV dataset... 我正在使用以下代码来创建训练数据集:在这里,inputDS是CSV数据集...

Dataset<Row> data1 = inputDS.select(label,features);
Dataset<Row> data2 = data1.withColumn("label",data1.col(label).cast("Double"));
data2.map(new MapFunction<Row,Row>() {
    @Override
    public Row call(Row row) throws Exception {
        double label = row.getAs("label");
        double prediction = row.getAs("prediction");
        DenseVector features = row.getAs("features");
        return RowFactory.create(label,features.toArray(),prediction);
    }
}, Encoders.bean(Row.class));

I am getting exception from this point: 从这一点上我得到了例外:

lrModel = lRegression.fit(ds);

Try to reduce Scala version to 2.10. 尝试将Scala版本降低到2.10。 Or you can inspect you code (Analyze -> Inspect code...) and find deprecated methods related to serialization an fix it 或者您可以检查您的代码(分析->检查代码...)并找到与序列化相关的已弃用方法,并对其进行修复

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何修复java.lang.ClassCastException:无法将scala.collection.immutable.List实例分配给字段类型scala.collection.Seq? - How to fix java.lang.ClassCastException: cannot assign instance of scala.collection.immutable.List to field type scala.collection.Seq? java.lang.ClassCastException: org.apache.spark.sql.Column cannot be cast to scala.collection.Seq - java.lang.ClassCastException: org.apache.spark.sql.Column cannot be cast to scala.collection.Seq 如何修复 scala 中的不匹配错误,其中发现:Seq[scala.collection.immutable.Seq required: scala.collection.Seq? - How can I fix mismatch error in scala where the found : Seq[scala.collection.immutable.Seq required: scala.collection.Seq? 转换scala.collection.Seq的Java对象 <String> 到python列表 - Convert Java object of scala.collection.Seq<String> to python list twitter4j.ResponseListImpl无法转换为scala.collection.immutable.List - twitter4j.ResponseListImpl cannot be cast to scala.collection.immutable.List 是否可以在一个类中同时实现scala.collection.Seq [T]和java.util.List [T] - Is it possible to implement both scala.collection.Seq[T] and java.util.List[T] in one class 如何在java代码中使用scala.collection.immutable.List数组 - How to use array of scala.collection.immutable.List in java code 如何在 Java 代码中使用 scala.collection.immutable.List - How to use scala.collection.immutable.List in a Java code 类无法加载或不是持久性的:Berkeley DB JE中的scala.collection.immutable.List - Class could not be loaded or is not persistent: scala.collection.immutable.List in Berkeley DB JE scala.collection.Seq在Java上不起作用 - scala.collection.Seq doesn't work on Java
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM