使用Spark SQL时无法将B强制转换为java.lang.String

Question

My issue is when I'm using trying to read data from a sql.Row as a String . 我的问题是当我尝试从sql.Row读取数据作为String 。 I'm using pyspark, but I've heard people have this issue with Scala API too. 我正在使用pyspark，但我也听说人们也使用Scala API遇到了这个问题。

The pyspark.sql.Row object is a pretty intransigent creature. pyspark.sql.Row对象是一个非常坚不可摧的生物。 The following exception is thrown: 引发以下异常：

java.lang.ClassCastException: [B cannot be cast to java.lang.String
 at org.apache.spark.sql.catalyst.expressions.GenericRow.getString(Row.scala 183)

So what we have is one of the fields is being represented as a byte array. 因此，我们拥有的字段之一被表示为字节数组。 The following python printing constructs do NOT work 以下python打印构造不起作用

repr(sqlRdd.take(2))

Also 也

import pprint
pprint.pprint(sqlRdd.take(2))

Both result in the ClassCastException. 两者都导致ClassCastException。

So.. how do other folks do this? 所以..其他人怎么做？ I started to roll my own (can not copy/paste here unfortunately..) But this is a bit re-inventing the wheel .. or so I suspect. 我开始自己动手了（不幸的是，不能在这里复制/粘贴。）但是我有点怀疑是在重新发明轮子。

Answer 1

Try 尝试

sqlContext.setConf("spark.sql.parquet.binaryAsString", "true")

I think since Spark 1.1.0 they broke it - reading binary as strings used to work, then they made it not work, but added this flag, but set it's default to false. 我认为自Spark 1.1.0起，他们就将其破坏了-将二进制读为可以正常工作的字符串，然后他们使其不起作用，但添加了此标志，但将其默认设置为false。

使用Spark SQL时无法将B强制转换为java.lang.String

问题描述

1 个解决方案

解决方案1
3 已采纳 2015-09-18 09:42:36

使用Spark SQL时无法将B强制转换为java.lang.String

问题描述

1 个解决方案

解决方案1 3 已采纳 2015-09-18 09:42:36

解决方案1
3 已采纳 2015-09-18 09:42:36