UnicodeEncodeError：'ascii'编解码器无法编码字符错误

Question

I am reading some files from google cloud storage using python 我正在使用python从谷歌云存储中读取一些文件

spark = SparkSession.builder.appName('aggs').getOrCreate()

df = spark.read.option("sep","\t").option("encoding", "UTF-8").csv('gs://path/', inferSchema=True, header=True,encoding='utf-8')
df.count()
df.show(10)

However, I keep getting an error that complains about the df.show(10) line: 但是，我一直收到一个抱怨df.show(10)行的错误：

df.show(10)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 
350, in show
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 162: ordinal not in range(128)

I googled and found this seems to be a common error and the solution should be added in the encoding of "UTF-8" to the spark.read.option , as I already did. 我用spark.read.option搜索，发现这似乎是一个常见的错误，解决方案应该在"UTF-8"的编码中添加到spark.read.option ，就像我已经做的那样。 Since this doesn't help, I am still getting this error, could experts help? 由于这没有帮助，我仍然得到这个错误，专家能帮忙吗？ Thanks in advance. 提前致谢。

Answer 1

How about exporting PYTHONIOENCODING before running your Spark job: 如何在运行Spark作业之前导出PYTHONIOENCODING ：

export PYTHONIOENCODING=utf8

For Python 3.7+ the following should also do the trick: 对于Python 3.7+ ，以下应该也可以做到：

sys.stdout.reconfigure(encoding='utf-8')

For Python 2.x you can use the following: 对于Python 2.x，您可以使用以下内容：

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

UnicodeEncodeError：'ascii'编解码器无法编码字符错误

问题描述

1 个解决方案

解决方案1
3 已采纳 2019-07-26 14:38:47

UnicodeEncodeError：&#39;ascii&#39;编解码器无法编码字符错误

问题描述

1 个解决方案

解决方案1 3 已采纳 2019-07-26 14:38:47

UnicodeEncodeError：'ascii'编解码器无法编码字符错误

解决方案1
3 已采纳 2019-07-26 14:38:47