如何将unicode转换为unicode转义的文本

Question

I'm loading a file with a bunch of unicode characters (eg \\xe9\\x87\\x8b ). 我正在加载带有一堆Unicode字符的文件（例如\\xe9\\x87\\x8b ）。 I want to convert these characters to their escaped-unicode form ( \釋 ) in Python. 我想将这些字符转换为Python中的转义Unicode形式（ \釋 ）。 I've found a couple of similar questions here on StackOverflow including this one Evaluate UTF-8 literal escape sequences in a string in Python3 , which does almost exactly what I want, but I can't work out how to save the data. 我在StackOverflow上发现了几个类似的问题，包括在Python3中的一个字符串中评估UTF-8文字转义序列的问题，它几乎完全符合我的要求，但是我不知道如何保存数据。

For example: Input file: 例如：输入文件：

\\xe9\\x87\\x8b

Python Script Python脚本

file = open("input.txt", "r")
text = file.read()
file.close()
encoded = text.encode().decode('unicode-escape').encode('latin1').decode('utf-8')
file = open("output.txt", "w")
file.write(encoded) # fails with a unicode exception
file.close()

Output File (That I would like): 输出文件（我想要）：

\釋

Answer 1

You need to encode it again with unicode-escape encoding. 您需要使用unicode-escape编码再次对其进行编码。

>>> br'\xe9\x87\x8b'.decode('unicode-escape').encode('latin1').decode('utf-8')
'釋'
>>> _.encode('unicode-escape')
b'\\u91cb'

Code modified (used binary mode to reduce unnecessary encode/decodes) 修改代码（使用二进制模式以减少不必要的编码/解码）

with open("input.txt", "rb") as f:
    text = f.read().rstrip()  # rstrip to remove trailing spaces
decoded = text.decode('unicode-escape').encode('latin1').decode('utf-8')
with open("output.txt", "wb") as f:
    f.write(decoded.encode('unicode-escape'))

http://asciinema.org/a/797ruy4u5gd1vsv8pplzlb6kq http://asciinema.org/a/797ruy4u5gd1vsv8pplzlb6kq

Answer 2

\\xe9\\x87\\x8b is not a Unicode character. \\xe9\\x87\\x8b不是Unicode字符。 It looks like a representation of a bytestring that represents 釋 Unicode character encoded using utf-8 character encoding. 它看起来像一个字节字符串的表示形式 ，它表示使用utf-8字符编码编码的釋 Unicode字符。 \釋 is a representation of 釋 character in Python source code (or in JSON format). \釋是一个的表示 釋在Python源代码字符（或JSON格式）。 Don't confuse the text representation and the character itself: 不要混淆文本表示和字符本身：

>>> b"\xe9\x87\x8b".decode('utf-8')
u'\u91cb' # repr()
>>> print(b"\xe9\x87\x8b".decode('utf-8'))
釋
>>> import unicodedata
>>> unicodedata.name(b"\xe9\x87\x8b".decode('utf-8'))
'CJK UNIFIED IDEOGRAPH-91CB'

To read text encoded as utf-8 from a file, specify the character encoding explicitly: 要从文件读取编码为utf-8的文本，请明确指定字符编码：

with open('input.txt', encoding='utf-8') as file:
    unicode_text = file.read()

It is exactly the same for saving Unicode text to a file: 将Unicode文本保存到文件中完全相同：

with open('output.txt', 'w', encoding='utf-8') as file:
    file.write(unicode_text)

If you omit the explicit encoding parameter then locale.getpreferredencoding(False) is used that may produce mojibake if it does not correspond to the actual character encoding used to save a file. 如果locale.getpreferredencoding(False)式encoding参数，则使用locale.getpreferredencoding(False) ，如果它与用于保存文件的实际字符编码不对应，则可能会产生locale.getpreferredencoding(False) 。

If your input file literally contains \\xe9 (4 characters) then you should fix whatever software generates it. 如果您的输入文件确实包含\\xe9 （4个字符），则应该修复所有软件来生成它。 If you need to use 'unicode-escape' ; 如果您需要使用'unicode-escape' ； something is broken. 东西坏了。

Answer 3

It looks as if your input file is UTF-8 encoded so specify UTF-8 encoding when you open the file (Python3 is assumed as per your reference): 看起来您的输入文件似乎是UTF-8编码的，所以在打开文件时指定UTF-8编码（根据您的参考假设为Python3）：

with open("input.txt", "r", encoding='utf8') as f:
    text = f.read()

text will contain the content of the file as a str (ie unicode string). text将以str （即unicode字符串）包含文件的内容。 Now you can write it in unicode escaped form directly to a file by specifying encoding='unicode-escape' : 现在，您可以通过指定encoding='unicode-escape' ，以unicode转义的形式将其直接写入文件：

with open('output.txt', 'w', encoding='unicode-escape') as f:
    f.write(text)

The content of your file will now contain unicode-escaped literals: 文件的内容现在将包含Unicode转义的文字：

$ cat output.txt
\u91cb

如何将unicode转换为unicode转义的文本

问题描述

3 个解决方案

解决方案1
3 已采纳 2015-09-15 04:42:36

解决方案2
1 2015-09-16 14:28:00

解决方案3
0 2015-09-15 05:06:10

如何将unicode转换为unicode转义的文本

问题描述

3 个解决方案

解决方案1 3 已采纳 2015-09-15 04:42:36

解决方案2 1 2015-09-16 14:28:00

解决方案3 0 2015-09-15 05:06:10

解决方案1
3 已采纳 2015-09-15 04:42:36

解决方案2
1 2015-09-16 14:28:00

解决方案3
0 2015-09-15 05:06:10