[英]UnicodeEncodeError: 'ascii' codec can't encode character error
I am reading some files from google cloud storage using python 我正在使用python从谷歌云存储中读取一些文件
spark = SparkSession.builder.appName('aggs').getOrCreate()
df = spark.read.option("sep","\t").option("encoding", "UTF-8").csv('gs://path/', inferSchema=True, header=True,encoding='utf-8')
df.count()
df.show(10)
However, I keep getting an error that complains about the df.show(10)
line: 但是,我一直收到一个抱怨df.show(10)
行的错误:
df.show(10)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line
350, in show
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 162: ordinal not in range(128)
I googled and found this seems to be a common error and the solution should be added in the encoding of "UTF-8"
to the spark.read.option
, as I already did. 我用spark.read.option
搜索,发现这似乎是一个常见的错误,解决方案应该在"UTF-8"
的编码中添加到spark.read.option
,就像我已经做的那样。 Since this doesn't help, I am still getting this error, could experts help? 由于这没有帮助,我仍然得到这个错误,专家能帮忙吗? Thanks in advance. 提前致谢。
How about exporting PYTHONIOENCODING
before running your Spark job: 如何在运行Spark作业之前导出PYTHONIOENCODING
:
export PYTHONIOENCODING=utf8
For Python 3.7+ the following should also do the trick: 对于Python 3.7+ ,以下应该也可以做到:
sys.stdout.reconfigure(encoding='utf-8')
For Python 2.x you can use the following: 对于Python 2.x,您可以使用以下内容:
import sys
reload(sys)
sys.setdefaultencoding('utf-8')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.