[英]Encode string as octal utf-8 Python 3
Is there a good way to encode strings to utf-8, but in octal format instead of the default hexadecimal?有没有一种将字符串编码为 utf-8 的好方法,但使用八进制格式而不是默认的十六进制?
For example:例如:
>>> "õ".encode("utf-8")
b'\xc3\xb5'
Here the output is hex, not octal.这里的 output 是十六进制的,而不是八进制的。 The output in octal would be: b'\303\265'
八进制的 output 将是: b'\303\265'
Python 3 automatically handles the decoding just fine: Python 3 自动处理解码就好了:
>>> b"\xc3\xb5".decode("utf-8")
'õ'
>>> b'\303\265'.decode("utf-8")
'õ'
Is there a codec or option I'm missing?是否有我缺少的编解码器或选项? I'd like to avoid a lot of manual string manipulation.
我想避免大量手动字符串操作。
update: I had misunderstood -- there is no difference between b"\xc3\xb5" and b'\303\265' at all, rather they are just 2 different ways to display the same underlying byte code.更新:我误解了 - b"\xc3\xb5" 和 b'\303\265' 之间没有区别,它们只是显示相同底层字节码的两种不同方式。 In fact:
实际上:
>>> b"\xc3\xb5" == b'\303\265'
True
Here's a class that overrides the representation of the string it wraps:这是一个 class 覆盖它包装的字符串的表示:
>>> class OctUTF8:
... def __init__(self,s):
... self.s = s.encode()
... def __repr__(self):
... return "b'" + ''.join(f'\\{n:03o}' for n in self.s) + "'"
...
>>> s='õ'
>>> OctUTF8(s)
b'\303\265'
This representation can be evaluated as a byte string and decoded back to the original:这种表示可以被评估为一个字节串并解码回原来的:
>>> eval(repr(OctUTF8(s))).decode()
'õ'
First, you can use ord()
to convert a character in a string首先,您可以使用
ord()
来转换字符串中的字符
to it's Unicode form, then, you can use oct()
:它的 Unicode 形式,然后,你可以使用
oct()
:
print(oct(ord("õ")))
Output: Output:
0o365
You can convert each byte in a bytes object to it's octal representation您可以将字节 object 中的每个字节转换为它的八进制表示
[oct(b) for b in "õ".encode("utf-8")]
Gives给
['0o303', '0o265']
You can manipulate the results to convert it to your desired output您可以操纵结果将其转换为您想要的 output
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.