简体   繁体   English

spark sql select 双列但说 FloatWritable 不能转换为 DoubleWritable

[英]spark sql select double column but says FloatWritable cannot cast to DoubleWritable

My code is just to select a double column, but I get floatWritable cannot be cast to DoubleWritable error.我的代码只是 select 一个双列,但我得到 floatWritable cannot be cast to DoubleWritable 错误。 Is that because spark has different procedure when reading hive double column?那是因为spark在读取hive双列时有不同的程序吗?

val testBase = spark.sql(
      s"""select
         | cast(clk_rate_7_day as double)
         |from %s
         |where ds between date_sub('%s', 0) and '%s'
         |union all
         |select
         | cast(clk_rate_7_day as double)
         |from %s
         |where ds between date_sub('%s', 0) and '%s'
         |""".stripMargin.format(trainDataInput, jobDate, jobDate, trainDataYuncunInput, jobDate, jobDate))
testBase.show()

but I get this error,但我得到这个错误,

testBase: org.apache.spark.sql.DataFrame = [clk_rate_7_day: double]
org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 2.0 failed 4 times, most recent failure: Lost task 0.3 in stage 2.0 (TID 11, hadoop4445.jd.163.org, executor 5): java.lang.ClassCastException: org.apache.hadoop.io.FloatWritable cannot be cast to org.apache.hadoop.io.DoubleWritable
    at org.apache.hadoop.hive.serde2.objectinspector.primitive.WritableDoubleObjectInspector.get(WritableDoubleObjectInspector.java:36)
    at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$8.apply(TableReader.scala:423)
    at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$14$$anonfun$apply$8.apply(TableReader.scala:423)
    at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:460)
    at org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$2.apply(TableReader.scala:451)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:410)
    at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
    at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
    at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:255)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:247)
 

Problem solved!问题解决了!

It turns out that I used spark to write this column as a FloatType, but its hive configuration is a Double, and then ended up with this cast error.原来我是用spark把这个专栏写成FloatType的,但是它的hive配置是Double,然后就报了这个cast error。

Btw, why Float cannot be cast into Double in spark?顺便说一句,为什么 Float 不能在 Spark 中转换为 Double?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM