[英]python3 decode external utf8 string
Suppose I have the following string that I want to decode as utf-8: 假设我有以下要解码为utf-8的字符串:
str ='\\u00d7\\u0090\\u00d7\\u0090\\u00d7\\u0090'
# expect 'אאא'
Using python 3, I would expect the following to work, but it doesn't: 使用python 3,我希望以下代码能正常工作,但不能:
bytes(str, 'ascii').decode('unicode-escape')
# prints '×××'
bytes(str, 'ascii').decode('utf-8')
# prints '\\u00d7\\u0090\\u00d7\\u0090\\u00d7\\u0090'
Any help? 有什么帮助吗?
You can do it with multiple trips through encode
/ decode
. 您可以通过
encode
/ decode
多次执行此操作。
print(st.encode('ascii').decode('unicode-escape').encode('iso-8859-1').decode('utf-8'))
The first is the preferred alternate to bytes
. 第一个是首选替代
bytes
。 The second converts the escape sequences to their equivalent characters. 第二个将转义序列转换为其等效字符。 The third takes advantage of Unicode being based on ISO-8859-1 for the first 256 code points to convert those characters directly back into bytes.
第三个利用Unicode基于ISO-8859-1的优势,前256个代码点将这些字符直接转换回字节。 Finally you can decode the UTF-8 string.
最后,您可以解码UTF-8字符串。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.