Python character decoder

Question

I have a set of files with broken characters. The code below works.

    w1252= "QWERTYUIOPASDFGHJKLZXCVBNM"
    w1251= "qwertyuiopasdfghjklzxcvbnm"

    def str_fix(string_w1252):
        for i in zip(w1252, w1251):
            string_w1252 = string_w1252.replace(i[0],i[1])
        return string_w1252

    print(str_fix("MY STRING")) #my string

When I replace it like

    w1252="¨ÉÖÓÊÅÍÃØÙÇÕÚÔÛÂÀÏÐÎËÄÆÝß×ÑÌÈÒÜÁÞ¸éöóêåíãøùçõúôûâàïðîëäæýÿ÷ñìèòüáþ"
    w1251= "ЁЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБёйцукенгшщзхъфывапролджэячсмитьбю"

    def str_fix(string_w1252):
        for i in zip(w1252, w1251):
            string_w1252 = string_w1252.replace(i[0],i[1])
        return string_w1252

    print(str_fix("# Ïåðåìåííûå"))

I get
SyntaxError: Non-ASCII character '\xc2' in file replacer.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

When I add

# -*- coding: windows-1251 -*- I get SyntaxError: 'charmap' codec can't decode byte 0x98 in position 0: character maps to <undefined>

and with

# -*- coding: utf-8 -*- I get #��

How can I tell Python to read characters as they appear in two strings in the editor?

Answer 1

For your python files you should use:

# -*- coding: utf-8 -*-

in the header, and write all code in utf-8, all your cyrillic characters are supported and you won't miss anything.

Check this out:

>>> 'абырвалг'.decode('UTF-8').encode('windows-1251')
'\xe0\xe1\xfb\xf0\xe2\xe0\xeb\xe3'

Python character decoder

Question

1 answers

solution1
0 2020-04-29 10:31:52

Python character decoder

Question

1 answers

solution1 0 2020-04-29 10:31:52

solution1
0 2020-04-29 10:31:52