简体   繁体   English

python3解码外部utf8字符串

[英]python3 decode external utf8 string

Suppose I have the following string that I want to decode as utf-8: 假设我有以下要解码为utf-8的字符串:

str ='\\u00d7\\u0090\\u00d7\\u0090\\u00d7\\u0090'
# expect 'אאא'

Using python 3, I would expect the following to work, but it doesn't: 使用python 3,我希望以下代码能正常工作,但不能:

bytes(str, 'ascii').decode('unicode-escape')
# prints '×××'
bytes(str, 'ascii').decode('utf-8')
# prints '\\u00d7\\u0090\\u00d7\\u0090\\u00d7\\u0090'

Any help? 有什么帮助吗?

You can do it with multiple trips through encode / decode . 您可以通过encode / decode多次执行此操作。

print(st.encode('ascii').decode('unicode-escape').encode('iso-8859-1').decode('utf-8'))

The first is the preferred alternate to bytes . 第一个是首选替代bytes The second converts the escape sequences to their equivalent characters. 第二个将转义序列转换为其等效字符。 The third takes advantage of Unicode being based on ISO-8859-1 for the first 256 code points to convert those characters directly back into bytes. 第三个利用Unicode基于ISO-8859-1的优势,前256个代码点将这些字符直接转换回字节。 Finally you can decode the UTF-8 string. 最后,您可以解码UTF-8字符串。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM