Encoding issue when reading file in Python

Question

I have a file containing

    foo = "Gro\xdfbritannien"

I'm using the following, but it always displays the original text with the \\x

    import codecs
    f = codecs.open('myfile', 'r', 'utf8')
    for line in f:
      print line
      print line.encode('utf-8')
      print line.decode('utf-8')

I can't see how to display the proper encoded text, as when I'm doing

    >>> print u'Gro\xdfbritannien'
    Großbritannien

Any hint would be appreciated!

Answer 1

When your file contains the line

foo = "Gro\xdfbritannien"

it contains an actual backslash character, followed by x , d and f . So if that line is read into a Python string, it is read as

'foo = "Gro\\xdfbritannien"'

(and since those are all ASCII characters, it doesn't matter if you open it with the utf-8 codec or not).

So you need to decode it first using the string_escape codec:

>>> foo.decode("string_escape")
'Gro\xdfbritannien'

and then decode it to the correct Unicode object

>>> _.decode("latin1")
u'Gro\xdfbritannien'

which you can then print

>>> print _
Großbritannien

Answer 2

There is no business of codec. You should do like this 'foo = "Gro\\xdfbritannien"'

>>> print u'Gro\\xdfbritannien'
Gro\xdfbritannien

Encoding issue when reading file in Python

Question

2 answers

solution1
4 ACCPTED 2014-02-13 09:12:53

solution2
-1 2014-02-13 09:20:44

Encoding issue when reading file in Python

Question

2 answers

solution1 4 ACCPTED 2014-02-13 09:12:53

solution2 -1 2014-02-13 09:20:44

solution1
4 ACCPTED 2014-02-13 09:12:53

solution2
-1 2014-02-13 09:20:44