[英]Creation of RDD[LabeledPoint]: java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
I have written the following code in order to convert SQL DataFrame df
to RDD[LabeledPoint]
: 我编写了以下代码,以便将SQL DataFrame df
转换为RDD[LabeledPoint]
:
val targetInd = df.columns.indexOf("myTarget")
val ignored = List("myTarget")
val featInd = df.columns.diff(ignored).map(df.columns.indexOf(_))
df.printSchema
val dfLP = df.rdd.map(r => LabeledPoint(
r.getDouble(targetInd),
Vectors.dense(featInd.map(r.getDouble(_)).toArray)
))
The schema looks like this: 架构如下所示:
root
|-- myTarget: long (nullable = true)
|-- var1: long (nullable = true)
|-- var2: double (nullable = true)
When I run dfLP.foreach(l => l.label)
, then the following error occurs: 当我运行dfLP.foreach(l => l.label)
,会发生以下错误:
java.lang.ClassCastException: java.lang.Long cannot be cast to java.lang.Double
How can I cast the label to double? 如何将标签加倍? I expect that other features might be both double or long, isn't it? 我希望其他功能可能是双倍或长,不是吗? If it's not true, then I will also need to cast the rest of features to double. 如果不是这样,那么我还需要将其余功能强制转换为双倍。
You could try casting all columns to double before mapping. 您可以尝试在映射之前将所有列转换为双精度。 Using foldLeft should do the trick: 使用foldLeft可以做到这一点:
df.columns.foldLeft(df) {
(newDF, colName) => newDF.withColumn(colName, df(colName).cast("double"))
}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.