简体   繁体   中英

UnicodeEncodeError: 'ascii' codec can't encode character error

I am reading some files from google cloud storage using python

spark = SparkSession.builder.appName('aggs').getOrCreate()

df = spark.read.option("sep","\t").option("encoding", "UTF-8").csv('gs://path/', inferSchema=True, header=True,encoding='utf-8')
df.count()
df.show(10)

However, I keep getting an error that complains about the df.show(10) line:

df.show(10)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 
350, in show
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 162: ordinal not in range(128)

I googled and found this seems to be a common error and the solution should be added in the encoding of "UTF-8" to the spark.read.option , as I already did. Since this doesn't help, I am still getting this error, could experts help? Thanks in advance.

How about exporting PYTHONIOENCODING before running your Spark job:

export PYTHONIOENCODING=utf8

For Python 3.7+ the following should also do the trick:

sys.stdout.reconfigure(encoding='utf-8')

For Python 2.x you can use the following:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM