简体   繁体   中英

Write Spark Dataframe in PostgreSQL with UTF-8 encoding

I have a Spark Dataframe that must be saved in PostgreSQL. I think I have the appropriate Python sentence except for the encoding options, since I get the following error:

UnicodeEncodeError: 'ascii' codec can't encode character u'\xf3' in position 95: ordinal not in range(128)

My current sentence is as:

df.write.jdbc(url=jdbc_url, table='{}.{}'.format(schema_name, table_name), mode='overwrite', properties=properties)

It seems by default Pyspark is trying to encode the dataframe as ASCII, thus I should specify the correct encoding (UTF-8). How to do that?

I've tried with option("charset", "utf-8") , option("encoding", "utf-8") and many other combinations I've seen in the Internet. I've also tried to add "client_encoding":"utf8" in the properties passed to jdbc . But nothing seems to work.

Any help would be really appreciated.

Additional info:

  • Python 2.7
  • Spark 1.6.2

My database is UTF-8 encoded:

$ sudo -u postgres psql db_test -c 'SHOW SERVER_ENCODING'
 server_encoding 
-----------------
 UTF8
(1 row)

I noticed together with this error another one was hidden in the logs: the PostgreSQL driver was complaining about the table I wanted to create was already created! Thus, I removed it from PostgreSQL and everything went like a charm :) Unfortunately, I was not able to completely understand how one thing was related to the other... Maybe because the table that was already created used ASCII encoding and there was some kind of incompatibility among it and the data that was intended to be saved?

You should try checking encoding of your postgre Databse.

psql my_database -c 'SHOW SERVER_ENCODING'

If that is not a multi-byte encoding then may be you need to change it to multibyte. See this thread for changing DB encoding:

Also this official documentation might be helpful: https://www.postgresql.org/docs/9.3/static/multibyte.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM