I have a file containing
foo = "Gro\xdfbritannien"
I'm using the following, but it always displays the original text with the \\x
import codecs
f = codecs.open('myfile', 'r', 'utf8')
for line in f:
print line
print line.encode('utf-8')
print line.decode('utf-8')
I can't see how to display the proper encoded text, as when I'm doing
>>> print u'Gro\xdfbritannien'
Großbritannien
Any hint would be appreciated!
When your file contains the line
foo = "Gro\xdfbritannien"
it contains an actual backslash character, followed by x
, d
and f
. So if that line is read into a Python string, it is read as
'foo = "Gro\\xdfbritannien"'
(and since those are all ASCII characters, it doesn't matter if you open it with the utf-8
codec or not).
So you need to decode it first using the string_escape
codec:
>>> foo.decode("string_escape")
'Gro\xdfbritannien'
and then decode it to the correct Unicode object
>>> _.decode("latin1")
u'Gro\xdfbritannien'
which you can then print
>>> print _
Großbritannien
There is no business of codec. You should do like this 'foo = "Gro\\xdfbritannien"'
>>> print u'Gro\\xdfbritannien'
Gro\xdfbritannien
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.