在 Python 3 中将 Unicode 字符串转换为十六进制的正确方法是什么？ [等候接听]

Question

我在 Python 中找到了两种将字符串转换为十六进制的方法：

第一种方式：

ss = "Ế string"
sss = [hex(ord(sc)) for sc in ss]
ssss = ''.join(sss).replace('0x', '')
print(ssss)  # The result is 1ebe20737472696e67

第二种方式：

import codecs
ss = "Ế string"
sss = codecs.encode(codecs.encode(ss, 'utf-8'), 'hex')
print(sss.decode('utf-8'))  # The result is: e1babe20737472696e67

两种方式返回不同的结果，哪个是正确的代码？

Answer 1

我不知道您为什么要将字符串转换为这样的十六进制，但我认为第二种方法更好：

ss = "Ế string"
# first decode the string to get the correct code point for utf8.
ss = ss.encode('utf-8')
# then convert the int code point to hex
sss = [hex(sc) for sc in ss]

print(''.join(sss).replace('0x', ''))

现在为什么因为使用decode会将字符串转换为byte序列，它们是 integer 的序列，每个值都是指定编解码器('utf8')中character的code point 。 此code point从一个codec更改为另一个。 基本上，它使用另一个codec将字符串转换为utf8中的hex表示将生成不同的hex表示。

Answer 2

我找到了答案，第一种方式返回一个UTF-16BE编码的字符串，第二种方式返回一个UTF-8编码的字符串。 如果我将第二种方式更改为codecs.encode(codecs.encode("ẵ", "utf-16be"),"hex")那么它们将返回相同的结果

在 Python 3 中将 Unicode 字符串转换为十六进制的正确方法是什么？ [等候接听]

问题描述

1 个解决方案

解决方案1
-1 2019-10-27 08:03:17

解决方案2
-2 2019-10-27 07:27:00

在 Python 3 中将 Unicode 字符串转换为十六进制的正确方法是什么？ [等候接听]

问题描述

1 个解决方案

解决方案1 -1 2019-10-27 08:03:17

解决方案2 -2 2019-10-27 07:27:00

解决方案1
-1 2019-10-27 08:03:17

解决方案2
-2 2019-10-27 07:27:00