[英]Python file input string: how to handle escaped unicode characters?
In a text file (test.txt), my string looks like this: 在文本文件(test.txt)中,我的字符串如下所示:
Gro\u00DFbritannien
Reading it, python escapes the backslash: 阅读它,python逃避反斜杠:
>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien'
How can I have this interpreted as unicode? 我怎么能把它解释为unicode? decode()
and unicode()
won't do the job. decode()
和unicode()
不会完成这项工作。
The following code writes Gro\ßbritannien
back to the file, but I want it to be Großbritannien
下面的代码将Gro\ßbritannien
写回文件,但我希望它是Großbritannien
>>> input.decode('latin-1')
u'Gro\\u00DFbritannien'
>>> out = codecs.open('out.txt', 'w', 'utf-8')
>>> out.write(input)
Use the built-in 'unicode_escape' codec: 使用内置的'unicode_escape'编解码器:
>>> file = open('test.txt', 'r')
>>> input = file.readline()
>>> input
'Gro\\u00DFbritannien\n'
>>> input.decode('unicode_escape')
u'Gro\xdfbritannien\n'
You may also use codecs.open()
: 您也可以使用codecs.open()
:
>>> import codecs
>>> file = codecs.open('test.txt', 'r', 'unicode_escape')
>>> input = file.readline()
>>> input
u'Gro\xdfbritannien\n'
The list of standard encodings is available in the Python documentation: http://docs.python.org/library/codecs.html#standard-encodings Python文档中提供了标准编码列表: http : //docs.python.org/library/codecs.html#standard-encodings
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.