简体   繁体   中英

Encoding issue when reading file in Python

I have a file containing

    foo = "Gro\xdfbritannien"

I'm using the following, but it always displays the original text with the \\x

    import codecs
    f = codecs.open('myfile', 'r', 'utf8')
    for line in f:
      print line
      print line.encode('utf-8')
      print line.decode('utf-8')

I can't see how to display the proper encoded text, as when I'm doing

    >>> print u'Gro\xdfbritannien'
    Großbritannien

Any hint would be appreciated!

When your file contains the line

foo = "Gro\xdfbritannien"

it contains an actual backslash character, followed by x , d and f . So if that line is read into a Python string, it is read as

'foo = "Gro\\xdfbritannien"'

(and since those are all ASCII characters, it doesn't matter if you open it with the utf-8 codec or not).

So you need to decode it first using the string_escape codec:

>>> foo.decode("string_escape")
'Gro\xdfbritannien'

and then decode it to the correct Unicode object

>>> _.decode("latin1")
u'Gro\xdfbritannien'

which you can then print

>>> print _
Großbritannien

There is no business of codec. You should do like this 'foo = "Gro\\xdfbritannien"'

>>> print u'Gro\\xdfbritannien'
Gro\xdfbritannien

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM