简体   繁体   中英

Python character decoder

I have a set of files with broken characters. The code below works.

    w1252= "QWERTYUIOPASDFGHJKLZXCVBNM"
    w1251= "qwertyuiopasdfghjklzxcvbnm"

    def str_fix(string_w1252):
        for i in zip(w1252, w1251):
            string_w1252 = string_w1252.replace(i[0],i[1])
        return string_w1252

    print(str_fix("MY STRING")) #my string

When I replace it like

    w1252="¨ÉÖÓÊÅÍÃØÙÇÕÚÔÛÂÀÏÐÎËÄÆÝß×ÑÌÈÒÜÁÞ¸éöóêåíãøùçõúôûâàïðîëäæýÿ÷ñìèòüáþ"
    w1251= "ЁЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБёйцукенгшщзхъфывапролджэячсмитьбю"

    def str_fix(string_w1252):
        for i in zip(w1252, w1251):
            string_w1252 = string_w1252.replace(i[0],i[1])
        return string_w1252

    print(str_fix("# Ïåðåìåííûå"))

I get
SyntaxError: Non-ASCII character '\xc2' in file replacer.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

When I add

# -*- coding: windows-1251 -*- I get SyntaxError: 'charmap' codec can't decode byte 0x98 in position 0: character maps to <undefined>

and with

# -*- coding: utf-8 -*- I get #���������������������

How can I tell Python to read characters as they appear in two strings in the editor?

For your python files you should use:

# -*- coding: utf-8 -*-

in the header, and write all code in utf-8, all your cyrillic characters are supported and you won't miss anything.


Check this out:

>>> 'абырвалг'.decode('UTF-8').encode('windows-1251')
'\xe0\xe1\xfb\xf0\xe2\xe0\xeb\xe3'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM