简体   繁体   English

创建RDD [LabeledPoint]:java.lang.ClassCastException:java.lang.Long无法强制转换为java.lang.Double

[英]Creation of RDD[LabeledPoint]: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double

I have written the following code in order to convert SQL DataFrame df to RDD[LabeledPoint] : 我编写了以下代码,以便将SQL DataFrame df转换为RDD[LabeledPoint]

val targetInd = df.columns.indexOf("myTarget")
val ignored = List("myTarget")
val featInd = df.columns.diff(ignored).map(df.columns.indexOf(_))

df.printSchema

val dfLP = df.rdd.map(r => LabeledPoint(
  r.getDouble(targetInd),
  Vectors.dense(featInd.map(r.getDouble(_)).toArray)
))

The schema looks like this: 架构如下所示:

root
 |-- myTarget: long (nullable = true)
 |-- var1: long (nullable = true)
 |-- var2: double (nullable = true)

When I run dfLP.foreach(l => l.label) , then the following error occurs: 当我运行dfLP.foreach(l => l.label) ,会发生以下错误:

java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double

How can I cast the label to double? 如何将标签加倍? I expect that other features might be both double or long, isn't it? 我希望其他功能可能是双倍或长,不是吗? If it's not true, then I will also need to cast the rest of features to double. 如果不是这样,那么我还需要将其余功能强制转换为双倍。

You could try casting all columns to double before mapping. 您可以尝试在映射之前将所有列转换为双精度。 Using foldLeft should do the trick: 使用foldLeft可以做到这一点:

df.columns.foldLeft(df) { 
  (newDF, colName) => newDF.withColumn(colName, df(colName).cast("double")) 
}

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 ClassCastException:java.lang.Double 不能转换为 org。 apache.spark.mllib.linalg.Vector 使用 LabeledPoint 时 - ClassCastException: java.lang.Double cannot be cast to org. apache.spark.mllib.linalg.Vector While using LabeledPoint Scala / Play ClassCastException:无法将java.lang.Integer转换为java.lang.Long] - Scala/Play ClassCastException: Cannot cast java.lang.Integer to java.lang.Long] java.lang.ClassCastException:java.lang.String 无法转换为 java.lang.Float - java.lang.ClassCastException: java.lang.String cannot be cast to java.lang.Float 异常:java.lang.Double 不能转换为 [D - Exception:java.lang.Double cannot be cast to [D scala spark rdd 错误:java.lang.ClassCastException:无法分配 java.lang.invoke.Serialized 的实例 - scala spark rdd error : java.lang.ClassCastException: cannot assign instance of java.lang.invoke.SerializedLambda 如何修复 Spark (Scala) 中的“java.lang.Integer cannot be cast to java.lang.Double”错误? - How to fix "java.lang.Integer cannot be cast to java.lang.Double" Error in Spark (Scala)? 尝试从RDD返回Map [(String,String),(Double,Double)]时无法将java.lang.String强制转换为java.lang.Double错误 - java.lang.String cannot be cast to java.lang.Double Error when trying to return Map[(String, String),(Double, Double)] from RDD java.lang.ClassCastException:无法将java.lang.String强制转换为com.fastdata.persistence.PersistenceService - java.lang.ClassCastException: java.lang.String cannot be cast to com.fastdata.persistence.PersistenceService Spark scala: java.lang.ClassCastException: java.lang.Integer cannot be cast to scala.collection.Seq - Spark scala: java.lang.ClassCastException: java.lang.Integer cannot be cast to scala.collection.Seq java.lang.ClassCastException:[B在解析json [String,String]时不能转换为java.lang.String - java.lang.ClassCastException: [B cannot be cast to java.lang.String while parsing json[String,String]
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM