简体   繁体   中英

Pandas dataframe with Greek characters to postgresql

I'm working with the following: Python 3.9.2 Pandas 1.3.4 PostgreSQL 14.1

I have created a sample dataframe to explore pandas and postgresql.

d = {'col1': ['Αθήνα', 'χαρύϊψχορες'], 'col2': ['Θεσσαλονίκη', 'Ξπφδ']}
df = pd.DataFrame(data=d)

Working on Jypiter when I call df, I get the following output:

    col1    col2
0   Αθήνα   Θεσσαλονίκη
1   χαρύϊψχορες Ξπφδ

Then, I try to add the data to psql, using the following code:

engine = create_engine('postgresql://user:password@localhost:port/test_db', encoding='utf-8-sig')
df.to_sql('sq_exp', engine)

The if I open my CMD to select the data, I get this error:

test_db=# select * from sq_exp;
ERROR:  character with byte sequence 0xce 0x91 in encoding "UTF8" has no equivalent in encoding "WIN1252"

While trying to fix this error, I set the encoding to win1252 only to get the same error:

test_db=# SET client_encoding TO 'WIN1252';
SET
test_db=# select * from sq_exp;
ERROR:  character with byte sequence 0xce 0x91 in encoding "UTF8" has no equivalent in encoding "WIN1252"

Following the documentation of postgres, I am setting the encoding to WIN1253 https://www.postgresql.org/docs/current/multibyte.html

The when I select the data: select * from sq_exp; I get the following output:

test_db=# select * from sq_exp;
 index |    col1     |    col2
-------+-------------+-------------
     0 | ┴Φ▐φß       | ╚σ≤≤ßδ∩φ▀Ωτ
     1 | ≈ß±²·°≈∩±σ≥ | ╬≡÷Σ

As you can see this is completely different from the df initially created and I have trying multiple approaches to solve it. Can please someone guide me thought how it can be done?

Thanks a lot: [1]: https://i.stack.imgur.com/kkK3R.png [2]: https://i.stack.imgur.com/49rAe.png [3]: https://i.stack.imgur.com/8wIoq.png [4]: https://i.stack.imgur.com/IsHVM.png

I have actually discovered the issue.

It seems that the CMD is not supporting the Greek characters and that's why the error is returned:

ERROR:  character with byte sequence 0xce 0x91 in encoding "UTF8" has no equivalent in encoding "WIN1252"

The resolution is actually to view your data your code editor or database administration tool.

Here is an example using Visual Studio code:

import psycopg2
conn_string = "host='localhost' dbname='database_name' user='postgres' password='password'"
conn = psycopg2.connect(conn_string)
cursor = conn.cursor()
cursor.execute("SELECT * FROM sq_exp")
records = cursor.fetchall()

Running the above will actually print you the expected output:

[(0, 'Αθήνα', 'Θεσσαλονίκη'), (1, 'χαρύϊψχορες', 'Ξπφδ')]

The same would apply using a database application tool

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM