简体   繁体   English

如何在Python中生成\\ x转义的UTF-8?

[英]How can I generate \x-escaped UTF-8 in Python?

I want to convert a unicode input to a \\x escaped, 7-bit-ascii-clean reprentation of a UTF-8 byte sequence. 我想将Unicode输入转换为UTF-8字节序列的\\ x转义的7位ascii-clean reprentation。

This is analogous to what I need, but instead of "\∪" I would like to generate "\\xe2\\x88\\xaa" 这类似于我所需要的,但是我想生成"\\xe2\\x88\\xaa"而不是"\∪" "\\xe2\\x88\\xaa"

>>> codecs.encode(u"\u222A", 'ascii', 'backslashreplace')
'\\u222a'

This looks like it is generating the desired result: 看起来它正在生成所需的结果:

>>> u"\u222A".encode('utf-8')
'\xe2\x88\xaa'

But that is merely an escaped representation. 但这仅仅是逃脱的表示。 The actual result isn't 12 ascii bytes, it's 3 UTF-8 bytes: 实际结果不是12个ascii字节,而是3个UTF-8字节:

>>> [ord(c) for c in u"\u222A".encode('utf-8')]
[226, 136, 170]

I could abuse that escaped representation to get what I want, stripping off the leading and trailing quote that repr adds: 我可能会滥用逃脱的表示形式来获取我想要的东西,去掉repr添加的前导和尾随报价:

>>> repr(u"\u222A".encode('utf-8'))[1:-1]
'\\xe2\\x88\\xaa'
>>> [ord(c) for c in repr(u"\u222A".encode('utf-8'))[1:-1]]
[92, 120, 101, 50, 92, 120, 56, 56, 92, 120, 97, 97]

Yuck. uck This is a little better: 这样好一点:

>>> import binascii
>>> ''.join('\\x' + binascii.hexlify(c) for c in u"\u222A".encode('utf-8'))
'\\xe2\\x88\\xaa'
>>> [ord(c) for c in ''.join('\\x' + binascii.hexlify(c) for c in u"\u222A".encode('utf-8'))]
[92, 120, 101, 50, 92, 120, 56, 56, 92, 120, 97, 97]

Is a better way to do this? 有更好的方法吗?

>>> u'\u222A'.encode('utf-8').encode('string-escape')
'\\xe2\\x88\\xaa'
>>> print u'\u222A'.encode('utf-8').encode('string-escape')
\xe2\x88\xaa

I don't think you'll find a solution that isn't ugly. 我认为您不会找到丑陋的解决方案。 Here's one that retains any ASCII characters that are in the original string without converting them to a hex sequence. 这是一种保留原始字符串中所有ASCII字符而不将其转换为十六进制序列的字符。

''.join(c if 32 <= ord(c) <= 127 else '\\x{:02x}'.format(ord(c)) for c in u"\u222A".encode('utf-8'))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM