简体   繁体   中英

Getting B cannot be cast to java.lang.String when using Spark SQL

My issue is when I'm using trying to read data from a sql.Row as a String . I'm using pyspark, but I've heard people have this issue with Scala API too.

The pyspark.sql.Row object is a pretty intransigent creature. The following exception is thrown:

java.lang.ClassCastException: [B cannot be cast to java.lang.String
 at org.apache.spark.sql.catalyst.expressions.GenericRow.getString(Row.scala 183)

So what we have is one of the fields is being represented as a byte array. The following python printing constructs do NOT work

repr(sqlRdd.take(2))

Also

import pprint
pprint.pprint(sqlRdd.take(2))

Both result in the ClassCastException.

So.. how do other folks do this? I started to roll my own (can not copy/paste here unfortunately..) But this is a bit re-inventing the wheel .. or so I suspect.

Try

sqlContext.setConf("spark.sql.parquet.binaryAsString", "true")

I think since Spark 1.1.0 they broke it - reading binary as strings used to work, then they made it not work, but added this flag, but set it's default to false.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM