![](/img/trans.png)
[英]Exception in thread “main” java.lang.UnsupportedOperationException: Schema for type Unit is not supported
[英]Spark error: Exception in thread “main” java.lang.UnsupportedOperationException
我正在编写一个Scala / spark程序,该程序可以找到员工的最高薪水。 员工数据在CSV文件中可用,并且薪金列具有逗号分隔符(成千上万),并且前缀有$,例如$ 74,628.00。
为了处理这个逗号和美元符号,我在scala中编写了一个解析器函数,该函数将在“,”上的每一行分开,然后将每一列映射到要分配给case类的各个变量。
我的解析器程序如下所示。 为了消除逗号和美元符号,我使用了replace函数将其替换为空,然后最终将大小写转换为Int。
def ParseEmployee(line: String): Classes.Employee = {
val fields = line.split(",")
val Name = fields(0)
val JOBTITLE = fields(2)
val DEPARTMENT = fields(3)
val temp = fields(4)
temp.replace(",","")//To eliminate the ,
temp.replace("$","")//To remove the $
val EMPLOYEEANNUALSALARY = temp.toInt //Type cast the string to Int
Classes.Employee(Name, JOBTITLE, DEPARTMENT, EMPLOYEEANNUALSALARY)
}
我的Case类别如下所示
case class Employee (Name: String,
JOBTITLE: String,
DEPARTMENT: String,
EMPLOYEEANNUALSALARY: Number,
)
我的Spark DataFrame SQL查询如下所示
val empMaxSalaryValue = sc.sqlContext.sql("Select Max(EMPLOYEEANNUALSALARY) From EMP")
empMaxSalaryValue.show
当我运行该程序时,出现以下异常
Exception in thread "main" java.lang.UnsupportedOperationException: No Encoder found for Number
- field (class: "java.lang.Number", name: "EMPLOYEEANNUALSALARY")
- root class: "Classes.Employee"
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:625)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:619)
at org.apache.spark.sql.catalyst.ScalaReflection$$anonfun$10.apply(ScalaReflection.scala:607)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:241)
at scala.collection.immutable.List.foreach(List.scala:381)
at scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:241)
at scala.collection.immutable.List.flatMap(List.scala:344)
at org.apache.spark.sql.catalyst.ScalaReflection$.org$apache$spark$sql$catalyst$ScalaReflection$$serializerFor(ScalaReflection.scala:607)
at org.apache.spark.sql.catalyst.ScalaReflection$.serializerFor(ScalaReflection.scala:438)
at org.apache.spark.sql.catalyst.encoders.ExpressionEncoder$.apply(ExpressionEncoder.scala:71)
at org.apache.spark.sql.Encoders$.product(Encoders.scala:275)
at org.apache.spark.sql.SparkSession.createDataFrame(SparkSession.scala:282)
at org.apache.spark.sql.SQLContext.createDataFrame(SQLContext.scala:272)
at CalculateMaximumSalary$.main(CalculateMaximumSalary.scala:27)
at CalculateMaximumSalary.main(CalculateMaximumSalary.scala)
知道为什么我会收到此错误吗? 我在这里犯的错误是什么?为什么不能将其强制转换为数字?
有没有更好的方法来解决这个问题,以获取雇员的最高工资?
Spark SQL仅提供有限数量的针对具体类的Encoders
。 不支持诸如Number
类的抽象类(可以与有限的二进制Encoders
一起使用)。
由于无论如何都转换为Int
,因此只需重新定义类即可:
case class Employee (
Name: String,
JOBTITLE: String,
DEPARTMENT: String,
EMPLOYEEANNUALSALARY: Int
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.