简体   繁体   中英

Cannot create file with Polish encoding

I am scraping data from a web-site and i have got a problem. I cannot create a file with data in polish encoding. I got a lot of unicode characters, but i do not want to have them instead of real characters. Could anyone help me? Thanks.

Here is some part of output content i get:

le\śnych, hibiskusa lub brzoskwini 250 g cukru 5 g kwasku cytrynowego 2 \ły\żki soku z cytryny

Here is the code creating the file:

with codecs.open('recipes.txt', 'w', 'cp1250') as w:
    w.write(string)

On Python 3 it gives always correct text

leśnych, hibiskusa lub brzoskwini 250 g cukru 5 g kwasku cytrynowego 2 łyżki soku z cytryny

So it seems you use Python 2 which always had problem with Polish coding.
(Polish is my native language).

Python 2 treats as normal string, not unicode char ś .

You have to encode and decode it again.

text = text.encode().decode('unicode_escape')

You should see correct text when you even use print()
(if only your system can works with CP1250 and has font with Polish chars)


Minimal working code

import codecs

text  = 'le\u015bnych, hibiskusa lub brzoskwini 250 g cukru 5 g kwasku cytrynowego 2 \u0142y\u017cki soku z cytryny'

text = text.encode().decode('unicode_escape') 
#print(text)

with codecs.open('recipes.txt', 'w', 'cp1250') as w:
    w.write(text)

The solution i found for me useful is to add .prettify('iso-8859-1').decode('utf-8', errors='replace') to all the strings you need to add. But before, please, read @furas answer and some comments from him.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM