My issue is when I'm using trying to read data from a sql.Row
as a String
. I'm using pyspark, but I've heard people have this issue with Scala API too.
The pyspark.sql.Row object is a pretty intransigent creature. The following exception is thrown:
java.lang.ClassCastException: [B cannot be cast to java.lang.String
at org.apache.spark.sql.catalyst.expressions.GenericRow.getString(Row.scala 183)
So what we have is one of the fields is being represented as a byte array. The following python printing constructs do NOT work
repr(sqlRdd.take(2))
Also
import pprint
pprint.pprint(sqlRdd.take(2))
Both result in the ClassCastException.
So.. how do other folks do this? I started to roll my own (can not copy/paste here unfortunately..) But this is a bit re-inventing the wheel .. or so I suspect.
Try
sqlContext.setConf("spark.sql.parquet.binaryAsString", "true")
I think since Spark 1.1.0 they broke it - reading binary as strings used to work, then they made it not work, but added this flag, but set it's default to false.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.