简体   繁体   English

使用Spark SQL时无法将B强制转换为java.lang.String

[英]Getting B cannot be cast to java.lang.String when using Spark SQL

My issue is when I'm using trying to read data from a sql.Row as a String . 我的问题是当我尝试从sql.Row读取数据作为String I'm using pyspark, but I've heard people have this issue with Scala API too. 我正在使用pyspark,但我也听说人们也使用Scala API遇到了这个问题。

The pyspark.sql.Row object is a pretty intransigent creature. pyspark.sql.Row对象是一个非常坚不可摧的生物。 The following exception is thrown: 引发以下异常:

java.lang.ClassCastException: [B cannot be cast to java.lang.String
 at org.apache.spark.sql.catalyst.expressions.GenericRow.getString(Row.scala 183)

So what we have is one of the fields is being represented as a byte array. 因此,我们拥有的字段之一被表示为字节数组。 The following python printing constructs do NOT work 以下python打印构造不起作用

repr(sqlRdd.take(2))

Also

import pprint
pprint.pprint(sqlRdd.take(2))

Both result in the ClassCastException. 两者都导致ClassCastException。

So.. how do other folks do this? 所以..其他人怎么做? I started to roll my own (can not copy/paste here unfortunately..) But this is a bit re-inventing the wheel .. or so I suspect. 我开始自己动手了(不幸的是,不能在这里复制/粘贴。)但是我有点怀疑是在重新发明轮子。

Try 尝试

sqlContext.setConf("spark.sql.parquet.binaryAsString", "true")

I think since Spark 1.1.0 they broke it - reading binary as strings used to work, then they made it not work, but added this flag, but set it's default to false. 我认为自Spark 1.1.0起,他们就将其破坏了-将二进制读为可以正常工作的字符串,然后他们使其不起作用,但添加了此标志,但将其默认设置为false。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Spark DF 枢轴错误:方法枢轴([类 java.lang.String,类 java.lang.String])不存在 - Spark DF pivot error: Method pivot([class java.lang.String, class java.lang.String]) does not exist java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeRow - java.lang.ClassCastException: java.lang.Integer cannot be cast to org.apache.spark.sql.catalyst.expressions.UnsafeRow pyspark Py4J 错误使用 canopy :PythonAccumulatorV2([class java.lang.String, class java.lang.Integer, class java.lang.String]) 不存在 - pyspark Py4J error using canopy :PythonAccumulatorV2([class java.lang.String, class java.lang.Integer, class java.lang.String]) does not exist 将 java.lang.string 转换为 PYthon 字符串/字典 - Converting java.lang.string to PYthon string/dictionary KafkaRecord 无法转换为 [B - KafkaRecord cannot be cast to [B JPype1=0.7.0: TypeError: Unable to convert str ro java type class java.lang.String - JPype1=0.7.0: TypeError: Unable to convert str ro java type class java.lang.String 小鸭,int() 参数必须是一个字符串,一个类似字节的 object 或一个数字,而不是 'java.lang.String', - Duckling, int() argument must be a string, a bytes-like object or a number, not 'java.lang.String', java.lang.String对象[]的数据类型与值meta [Date]不对应 - The data type of java.lang.String object [] does not correspond to value meta [Date] elasticsearch 映射预期的 map 用于字段 [name] 上的属性 [fields],但得到了 class Z93F725A07423FE21C846Zlang4。 - elasticsearch mapping Expected map for property [fields] on field [name] but got a class java.lang.String py4j.Py4JException:方法和([class java.lang.String])不存在 - py4j.Py4JException: Method and([class java.lang.String]) does not exist
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM