Apache spark Row getAs[String] : java.lang.Byte cannot be cast to java.lang.String

Question

I have a Spark Dataframe, which looks like this:

+-----------+-----+
|foo        |  bar|
+-----------+-----+
|          3|10119|
|          2| 4305|
+-----------+-----+

And it has the following schema

org.apache.spark.sql.types.StructType = StructType(
    StructField(foo,ByteType,true), 
    StructField(bar,LongType,false)
)

As you can see, the column foo is of ByteType .

I need to get the first row of foo as a String.

When I try

val fooStr = df.first.getAs[String](0)

I get cast exception:

java.lang.ClassCastException: java.lang.Byte cannot be cast to java.lang.String

However when I use toString , I am able to cast

val myStr = df.first.get(0).toString

Why is it that when I use Row.getAs[String] I get a casting exception, but when I use toString , there is no error. Is there any drawback to using toString ?

Answer 1

Row.getAs[T](i) is here defined as

def getAs[T](i: Int): T = get(i).asInstanceOf[T]

asInstanceOf[T] simply tries to cast the object to the desired type (see here ) without any further transformations. If the type returned by get(i) and the desired type are not compatible (like Byte and String) a ClassCastException is thrown.

Calling toString on the return value of get(0) means however that Byte.toString() is called. This is not a cast but a regular method call that returns a String.

Apache spark Row getAs[String] : java.lang.Byte cannot be cast to java.lang.String

Question

1 answers

solution1
1 ACCPTED 2020-06-26 21:01:41

Apache spark Row getAs[String] : java.lang.Byte cannot be cast to java.lang.String

Question

1 answers

solution1 1 ACCPTED 2020-06-26 21:01:41

solution1
1 ACCPTED 2020-06-26 21:01:41