I have a set of files with broken characters. The code below works.
w1252= "QWERTYUIOPASDFGHJKLZXCVBNM"
w1251= "qwertyuiopasdfghjklzxcvbnm"
def str_fix(string_w1252):
for i in zip(w1252, w1251):
string_w1252 = string_w1252.replace(i[0],i[1])
return string_w1252
print(str_fix("MY STRING")) #my string
When I replace it like
w1252="¨ÉÖÓÊÅÍÃØÙÇÕÚÔÛÂÀÏÐÎËÄÆÝß×ÑÌÈÒÜÁÞ¸éöóêåíãøùçõúôûâàïðîëäæýÿ÷ñìèòüáþ"
w1251= "ЁЙЦУКЕНГШЩЗХЪФЫВАПРОЛДЖЭЯЧСМИТЬБёйцукенгшщзхъфывапролджэячсмитьбю"
def str_fix(string_w1252):
for i in zip(w1252, w1251):
string_w1252 = string_w1252.replace(i[0],i[1])
return string_w1252
print(str_fix("# Ïåðåìåííûå"))
I getSyntaxError: Non-ASCII character '\xc2' in file replacer.py on line 1, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details
When I add
# -*- coding: windows-1251 -*-
I get SyntaxError: 'charmap' codec can't decode byte 0x98 in position 0: character maps to <undefined>
and with
# -*- coding: utf-8 -*-
I get #���������������������
How can I tell Python to read characters as they appear in two strings in the editor?
For your python files you should use:
# -*- coding: utf-8 -*-
in the header, and write all code in utf-8, all your cyrillic characters are supported and you won't miss anything.
Check this out:
>>> 'абырвалг'.decode('UTF-8').encode('windows-1251')
'\xe0\xe1\xfb\xf0\xe2\xe0\xeb\xe3'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.