简体   繁体   中英

org.apache.spark.sql.types.DataTypeException: Unsupported dataType: IntegerType

I am new to Spark and Scala i am stuck on this exception, I am trying to add some extra fields, ie StructField to an existing StructType retrieved from Data Frame for a column using Spark SQL and gettting below exception.

code snippet:

val dfStruct:StructType=parquetDf.select("columnname").schema
dfStruct.add("newField","IntegerType",true)

Exception in thread "main"

 org.apache.spark.sql.types.DataTypeException: Unsupported dataType: IntegerType. If you have a struct and a field name of it has any special characters, please use backticks (`) to quote that field name, e.g. `x+y`. Please note that backtick itself is not supported in a field name.
    at org.apache.spark.sql.types.DataTypeParser$class.toDataType(DataTypeParser.scala:95)
    at org.apache.spark.sql.types.DataTypeParser$$anon$1.toDataType(DataTypeParser.scala:107)
    at org.apache.spark.sql.types.DataTypeParser$.parse(DataTypeParser.scala:111)

I can see there some open issues running on jira related to this exception but not able to understand much. I am using Spark 1.5.1 version

https://mail-archives.apache.org/mod_mbox/spark-issues/201508.mbox/%3CJIRA.12852533.1438855066000.143133.1440397426473@Atlassian.JIRA%3E

https://mail-archives.apache.org/mod_mbox/spark-issues/201508.mbox/%3CJIRA.12852533.1438855066000.143133.1440397426473@Atlassian.JIRA%3E

https://issues.apache.org/jira/browse/SPARK-9685

When you use StructType.add with a following signature:

add(name: String, dataType: String, nullable: Boolean)

dataType string should correspond to either .simpleString or .typeName . For IntegerType it is either int :

import org.apache.spark.sql.types._

IntegerType.simpleString
// String = int

or integer :

IntegerType.typeName
// String = integer

so what you need is something like this:

val schema = StructType(Nil)

schema.add("foo", "int", true)
// org.apache.spark.sql.types.StructType = 
//   StructType(StructField(foo,IntegerType,true))

or

schema.add("foo", "integer", true)
// org.apache.spark.sql.types.StructType = 
//   StructType(StructField(foo,IntegerType,true))

If you want to pass IntegerType it has to be DataType not String :

schema.add("foo", IntegerType, true)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM