Python文件输入字符串：如何处理转义的unicode字符？

Question

In a text file (test.txt), my string looks like this: 在文本文件（test.txt）中，我的字符串如下所示：

Gro\u00DFbritannien

Reading it, python escapes the backslash: 阅读它，python逃避反斜杠：

>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien'

How can I have this interpreted as unicode? 我怎么能把它解释为unicode？ decode() and unicode() won't do the job. decode()和unicode()不会完成这项工作。

The following code writes Gro\ßbritannien back to the file, but I want it to be Großbritannien 下面的代码将Gro\ßbritannien写回文件，但我希望它是Großbritannien

>>> input.decode('latin-1')
u'Gro\\u00DFbritannien'
>>> out = codecs.open('out.txt', 'w', 'utf-8')
>>> out.write(input)

Answer 1

You want to use the unicode_escape codec: 您想使用unicode_escape编解码器：

>>> x = 'Gro\\u00DFbritannien'
>>> y = unicode(x, 'unicode_escape')
>>> print y
Großbritannien

See the docs for the vast number of standard encodings that come as part of the Python standard library. 请参阅文档，了解作为Python标准库的一部分的大量标准编码。

Answer 2

Use the built-in 'unicode_escape' codec: 使用内置的'unicode_escape'编解码器：

>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien\n'
>>> input.decode('unicode_escape')
u'Gro\xdfbritannien\n'

You may also use codecs.open() : 您也可以使用codecs.open() ：

>>> import codecs
>>> file = codecs.open('test.txt', 'r', 'unicode_escape')
>>> input = file.readline()
>>> input
u'Gro\xdfbritannien\n'

The list of standard encodings is available in the Python documentation: http://docs.python.org/library/codecs.html#standard-encodings Python文档中提供了标准编码列表： http ： //docs.python.org/library/codecs.html#standard-encodings

Python文件输入字符串：如何处理转义的unicode字符？

问题描述

2 个解决方案

解决方案1
9 2010-05-11 14:11:33

解决方案2
4 2010-05-11 14:07:25

Python文件输入字符串：如何处理转义的unicode字符？

问题描述

2 个解决方案

解决方案1 9 2010-05-11 14:11:33

解决方案2 4 2010-05-11 14:07:25

解决方案1
9 2010-05-11 14:11:33

解决方案2
4 2010-05-11 14:07:25