UnicodeEncodeError: 'ascii' codec can't encode character error

Question

I am reading some files from google cloud storage using python

spark = SparkSession.builder.appName('aggs').getOrCreate()

df = spark.read.option("sep","\t").option("encoding", "UTF-8").csv('gs://path/', inferSchema=True, header=True,encoding='utf-8')
df.count()
df.show(10)

However, I keep getting an error that complains about the df.show(10) line:

df.show(10)
File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/dataframe.py", line 
350, in show
UnicodeEncodeError: 'ascii' codec can't encode character u'\ufffd' in position 162: ordinal not in range(128)

I googled and found this seems to be a common error and the solution should be added in the encoding of "UTF-8" to the spark.read.option , as I already did. Since this doesn't help, I am still getting this error, could experts help? Thanks in advance.

Answer 1

How about exporting PYTHONIOENCODING before running your Spark job:

export PYTHONIOENCODING=utf8

For Python 3.7+ the following should also do the trick:

sys.stdout.reconfigure(encoding='utf-8')

For Python 2.x you can use the following:

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

UnicodeEncodeError: 'ascii' codec can't encode character error

Question

1 answers

solution1
3 ACCPTED 2019-07-26 14:38:47

UnicodeEncodeError: 'ascii' codec can't encode character error

Question

1 answers

solution1 3 ACCPTED 2019-07-26 14:38:47

solution1
3 ACCPTED 2019-07-26 14:38:47