简体   繁体   English

将字符串编码为八进制 utf-8 Python 3

[英]Encode string as octal utf-8 Python 3

Is there a good way to encode strings to utf-8, but in octal format instead of the default hexadecimal?有没有一种将字符串编码为 utf-8 的好方法,但使用八进制格式而不是默认的十六进制?

For example:例如:

>>> "õ".encode("utf-8")
b'\xc3\xb5'

Here the output is hex, not octal.这里的 output 是十六进制的,而不是八进制的。 The output in octal would be: b'\303\265'八进制的 output 将是: b'\303\265'

Python 3 automatically handles the decoding just fine: Python 3 自动处理解码就好了:

>>> b"\xc3\xb5".decode("utf-8")
'õ'
>>> b'\303\265'.decode("utf-8")
'õ'

Is there a codec or option I'm missing?是否有我缺少的编解码器或选项? I'd like to avoid a lot of manual string manipulation.我想避免大量手动字符串操作。

update: I had misunderstood -- there is no difference between b"\xc3\xb5" and b'\303\265' at all, rather they are just 2 different ways to display the same underlying byte code.更新:我误解了 - b"\xc3\xb5" 和 b'\303\265' 之间没有区别,它们只是显示相同底层字节码的两种不同方式。 In fact:实际上:

>>> b"\xc3\xb5" == b'\303\265'
True

Here's a class that overrides the representation of the string it wraps:这是一个 class 覆盖它包装的字符串的表示:

>>> class OctUTF8:
...   def __init__(self,s):
...     self.s = s.encode()
...   def __repr__(self):
...     return "b'" + ''.join(f'\\{n:03o}' for n in self.s) + "'"
...
>>> s='õ'
>>> OctUTF8(s)
b'\303\265'

This representation can be evaluated as a byte string and decoded back to the original:这种表示可以被评估为一个字节串并解码回原来的:

>>> eval(repr(OctUTF8(s))).decode()
'õ'

First, you can use ord() to convert a character in a string首先,您可以使用ord()来转换字符串中的字符
to it's Unicode form, then, you can use oct() :它的 Unicode 形式,然后,你可以使用oct()

print(oct(ord("õ")))

Output: Output:

0o365

You can convert each byte in a bytes object to it's octal representation您可以将字节 object 中的每个字节转换为它的八进制表示

[oct(b) for b in "õ".encode("utf-8")]

Gives

['0o303', '0o265']

You can manipulate the results to convert it to your desired output您可以操纵结果将其转换为您想要的 output

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM