简体   繁体   English

ClassCastException:java.lang.Double 不能转换为 org。 apache.spark.mllib.linalg.Vector 使用 LabeledPoint 时

[英]ClassCastException: java.lang.Double cannot be cast to org. apache.spark.mllib.linalg.Vector While using LabeledPoint

I am trying to use SVMWithSGD to train my model, but I encounter ClassCastException while trying to access my training.我正在尝试使用 SVMWithSGD 来训练我的 model,但在尝试访问我的训练时遇到了 ClassCastException。 My train_data dataframe schema looks like:我的 train_data dataframe 架构如下所示:

train_data.printSchema
root
 |-- label: string (nullable = true)
 |-- features: vector (nullable = true)
 |-- label_index: double (nullable = false)

I created an LabeledPoint RDD to use it on SVNWithSGD我创建了一个 LabeledPoint RDD 以在 SVNWithSGD 上使用它

    val targetInd = train_data.columns.indexOf("label_index")`
    val featInd = Array("features").map(train_data.columns.indexOf(_))  
    val train_lp = train_data.rdd.map(r => LabeledPoint( r.getDouble(targetInd),
    Vectors.dense(featInd.map(r.getDouble(_)).toArray)))

But When I call SVMWithSGD.train(train_lp, numIterations)但是当我调用 SVMWithSGD.train(train_lp, numIterations)

it gives me:它给了我:

Driver stacktrace:
  at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGSched
uler$$failJobAndIndependentStages(DAGScheduler.scala:1889)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1877)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGSche
duler.scala:1876)
  at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:
59)
  at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
  at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1876)

  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.appl
y(DAGScheduler.scala:926)
  at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.appl
y(DAGScheduler.scala:926)
  at scala.Option.foreach(Option.scala:257)
  at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.sc
ala:926)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGSche
duler.scala:2110)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGSchedu
ler.scala:2059)
  at org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGSchedu
ler.scala:2048)
  at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
  at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:737)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2061)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2082)
  at org.apache.spark.SparkContext.runJob(SparkContext.scala:2101)
  at org.apache.spark.rdd.RDD$$anonfun$take$1.apply(RDD.scala:1364)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:1
51)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:1
12)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  at org.apache.spark.rdd.RDD.take(RDD.scala:1337)
  at org.apache.spark.rdd.RDD$$anonfun$first$1.apply(RDD.scala:1378)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:1
51)
  at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:1
12)
  at org.apache.spark.rdd.RDD.withScope(RDD.scala:363)
  at org.apache.spark.rdd.RDD.first(RDD.scala:1377)
  at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.generateInitia
lWeights(GeneralizedLinearAlgorithm.scala:204)
  at org.apache.spark.mllib.regression.GeneralizedLinearAlgorithm.run(Generalize
dLinearAlgorithm.scala:234)
  at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:217)
  at org.apache.spark.mllib.classification.SVMWithSGD$.train(SVM.scala:255)
  ... 55 elided
Caused by: java.lang.ClassCastException: java.lang.Double cannot be cast to org.
apache.spark.mllib.linalg.Vector

My train_data was created based on label (file_name) and features (json file representing images features).我的 train_data 是基于 label(文件名)和特征(表示图像特征的 json 文件)创建的。

Try using this -尝试使用这个 -

Schema架构

train_data.printSchema
root
 |-- label: string (nullable = true)
 |-- features: vector (nullable = true)
 |-- label_index: double (nullable = false)

Modify your code as-将您的代码修改为-

  val train_lp = train_data.rdd.map(r => LabeledPoint(r.getAs("label_index"), r.getAs("features")))

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 org.apache.spark.ml.linalg.DenseVector 不能转换为 java.lang.Double - org.apache.spark.ml.linalg.DenseVector cannot be cast to java.lang.Double Spark ClassCastException:无法将 JavaRDD 转换为 org.apache.spark.mllib.linalg.Vector - Spark ClassCastException: JavaRDD cannot be cast to org.apache.spark.mllib.linalg.Vector 创建RDD [LabeledPoint]:java.lang.ClassCastException:java.lang.Long无法强制转换为java.lang.Double - Creation of RDD[LabeledPoint]: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double 使用Apache Spark中的Scala - MLLib转换LabeledPoint中的Vector的RDD - Convert RDD of Vector in LabeledPoint using Scala - MLLib in Apache Spark 使用Scala将org.apache.spark.mllib.linalg.Vector RDD转换为Spark中的DataFrame - Convert an org.apache.spark.mllib.linalg.Vector RDD to a DataFrame in Spark using Scala org.apache.spark.mllib.linalg.Vector到DataFrame标量 - org.apache.spark.mllib.linalg.Vector to DataFrame scala 如何修复 Spark (Scala) 中的“java.lang.Integer cannot be cast to java.lang.Double”错误? - How to fix "java.lang.Integer cannot be cast to java.lang.Double" Error in Spark (Scala)? java.lang.ClassCastException: org.apache.spark.sql.Column cannot be cast to scala.collection.Seq - java.lang.ClassCastException: org.apache.spark.sql.Column cannot be cast to scala.collection.Seq 将RDD [org.apache.spark.sql.Row]转换为RDD [org.apache.spark.mllib.linalg.Vector] - Converting RDD[org.apache.spark.sql.Row] to RDD[org.apache.spark.mllib.linalg.Vector] scala.collection.mutable.ArrayBuffer无法转换为java.lang.Double(Spark) - scala.collection.mutable.ArrayBuffer cannot be cast to java.lang.Double (Spark)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM