In a text file (test.txt), my string looks like this:
Gro\u00DFbritannien
Reading it, python escapes the backslash:
>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien'
How can I have this interpreted as unicode? decode()
and unicode()
won't do the job.
The following code writes Gro\ßbritannien
back to the file, but I want it to be Großbritannien
>>> input.decode('latin-1')
u'Gro\\u00DFbritannien'
>>> out = codecs.open('out.txt', 'w', 'utf-8')
>>> out.write(input)
You want to use the unicode_escape
codec:
>>> x = 'Gro\\u00DFbritannien'
>>> y = unicode(x, 'unicode_escape')
>>> print y
Großbritannien
See the docs for the vast number of standard encodings that come as part of the Python standard library.
Use the built-in 'unicode_escape' codec:
>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien\n'
>>> input.decode('unicode_escape')
u'Gro\xdfbritannien\n'
You may also use codecs.open()
:
>>> import codecs
>>> file = codecs.open('test.txt', 'r', 'unicode_escape')
>>> input = file.readline()
>>> input
u'Gro\xdfbritannien\n'
The list of standard encodings is available in the Python documentation: http://docs.python.org/library/codecs.html#standard-encodings
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.