Python ASCII到Unicode

Question

I have known how to get this '4f60597d' from u'\你\好' 我知道如何从u'\\ u4f60 \\ u597d'获取此'4f60597d'

>>> u_str= u'你好'
>>> repr(u_str).replace('\u', '')[2:-1] 
'4f60597d'

But if there are some ascii in the string ： 但是，如果字符串中包含一些ASCII：

>>> u_str= u'12你好'    
>>> repr(u_str).replace('\u', '')[2:-1] 
'124f60597d'

This is not the result I want to. 这不是我想要的结果。

I expect that I can get the output like this : 003100324f60597d 我希望我可以得到这样的输出： 003100324f60597d

Could you tell me? 你可以告诉我吗？

Answer 1

You could use ord() to get the integer codepoint for each character and format that instead: 您可以使用ord()来获取每个字符的整数代码点，并采用以下格式：

''.join(format(ord(c), '04x') for c in u_str)

Demo: 演示：

>>> u_str = u'12你好'  
>>> ''.join(format(ord(c), '04x') for c in u_str)
'003100324f60597d'

or you could encode to UTF-16 (big endian) and use binascii.hexlify() on the result; 或者您可以编码为UTF-16（大端），并在结果上使用binascii.hexlify() ； this is probably the faster option: 这可能是更快的选择：

from binascii import hexlify

hexlify(u_str.encode('utf-16-be'))

Demo: 演示：

>>> from binascii import hexlify
>>> hexlify(u_str.encode('utf-16-be'))
'003100324f60597d'

The latter also handles characters outside of the BMP, requiring 4 bytes per codepoint, which would be encoded using UTF-16 surrogate pairs: 后者还处理BMP之外的字符，每个代码点需要4个字节，这些字符将使用UTF-16代理对进行编码：

>>> hexlify(u'\U0001F493'.encode('utf-16-be'))
'd83ddc93'

Python ASCII到Unicode

问题描述

1 个解决方案

解决方案1
5 已采纳 2014-03-03 15:23:02

Python ASCII到Unicode

问题描述

1 个解决方案

解决方案1 5 已采纳 2014-03-03 15:23:02

解决方案1
5 已采纳 2014-03-03 15:23:02