Cannot create file with Polish encoding

Question

I am scraping data from a web-site and i have got a problem. I cannot create a file with data in polish encoding. I got a lot of unicode characters, but i do not want to have them instead of real characters. Could anyone help me? Thanks.

Here is some part of output content i get:

le\śnych, hibiskusa lub brzoskwini 250 g cukru 5 g kwasku cytrynowego 2 \ły\żki soku z cytryny

Here is the code creating the file:

with codecs.open('recipes.txt', 'w', 'cp1250') as w:
    w.write(string)

Answer 1

On Python 3 it gives always correct text

leśnych, hibiskusa lub brzoskwini 250 g cukru 5 g kwasku cytrynowego 2 łyżki soku z cytryny

So it seems you use Python 2 which always had problem with Polish coding.
(Polish is my native language).

Python 2 treats \ś as normal string, not unicode char ś .

You have to encode and decode it again.

text = text.encode().decode('unicode_escape')

You should see correct text when you even use print()
(if only your system can works with CP1250 and has font with Polish chars)

Minimal working code

import codecs

text  = 'le\u015bnych, hibiskusa lub brzoskwini 250 g cukru 5 g kwasku cytrynowego 2 \u0142y\u017cki soku z cytryny'

text = text.encode().decode('unicode_escape') 
#print(text)

with codecs.open('recipes.txt', 'w', 'cp1250') as w:
    w.write(text)

Answer 2

The solution i found for me useful is to add .prettify('iso-8859-1').decode('utf-8', errors='replace') to all the strings you need to add. But before, please, read @furas answer and some comments from him.

Cannot create file with Polish encoding

Question

2 answers

solution1
0 ACCPTED 2020-04-23 23:27:05

solution2
0 2020-04-28 20:34:42

Cannot create file with Polish encoding

Question

2 answers

solution1 0 ACCPTED 2020-04-23 23:27:05

solution2 0 2020-04-28 20:34:42

solution1
0 ACCPTED 2020-04-23 23:27:05

solution2
0 2020-04-28 20:34:42