[英]Getting B cannot be cast to java.lang.String when using Spark SQL
My issue is when I'm using trying to read data from a sql.Row
as a String
. 我的问题是当我尝试从
sql.Row
读取数据作为String
。 I'm using pyspark, but I've heard people have this issue with Scala API too. 我正在使用pyspark,但我也听说人们也使用Scala API遇到了这个问题。
The pyspark.sql.Row object is a pretty intransigent creature. pyspark.sql.Row对象是一个非常坚不可摧的生物。 The following exception is thrown:
引发以下异常:
java.lang.ClassCastException: [B cannot be cast to java.lang.String
at org.apache.spark.sql.catalyst.expressions.GenericRow.getString(Row.scala 183)
So what we have is one of the fields is being represented as a byte array. 因此,我们拥有的字段之一被表示为字节数组。 The following python printing constructs do NOT work
以下python打印构造不起作用
repr(sqlRdd.take(2))
Also 也
import pprint
pprint.pprint(sqlRdd.take(2))
Both result in the ClassCastException. 两者都导致ClassCastException。
So.. how do other folks do this? 所以..其他人怎么做? I started to roll my own (can not copy/paste here unfortunately..) But this is a bit re-inventing the wheel .. or so I suspect.
我开始自己动手了(不幸的是,不能在这里复制/粘贴。)但是我有点怀疑是在重新发明轮子。
Try 尝试
sqlContext.setConf("spark.sql.parquet.binaryAsString", "true")
I think since Spark 1.1.0 they broke it - reading binary as strings used to work, then they made it not work, but added this flag, but set it's default to false. 我认为自Spark 1.1.0起,他们就将其破坏了-将二进制读为可以正常工作的字符串,然后他们使其不起作用,但添加了此标志,但将其默认设置为false。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.