如何将 unicode integer 转换为 utf-8 字符？

Question

I have a set of unicode codepoints stored as integers, and I'd like to encode these as UTF-8. If I understand correctly, UTF-8 is just an encoding for integers (the fact that it's used for unicode in particular isn't fundamental to UTF-8), so this should just be a matter of encoding an integer in the UTF-8 encoding.我有一组 unicode 代码点存储为整数，我想将它们编码为 UTF-8。如果我理解正确，UTF-8 只是整数的编码（它特别用于 unicode 的事实并不是 UTF 的基础-8)，所以这应该只是在 UTF-8 编码中编码 integer 的问题。 Is there a standard utility for doing this, and if not, is there an easy way of doing it manually?是否有执行此操作的标准实用程序，如果没有，是否有手动执行此操作的简单方法？

Answer 1

There is an easy way.有一个简单的方法。 If you are on windows you might run into problems if you surpass UTF-16LE characters due to platform limitations.如果您使用的是 windows，如果由于平台限制超过 UTF-16LE 字符，您可能会遇到问题。 On linux you should be safe with full unicode.在 linux 上，完整的 unicode 应该是安全的。

>>> my_unicode_codepoints = [1234, 2345, 3456, 4576] # example codepoints

>>> [chr(i) for i in unicode_codepoints] # step 1: use python automagic for casting to wide enough chars
['Ӓ', 'ऩ', '\u0d80', 'ᇠ']

>>> "".join([chr(i) for i in unicode_codepoints]) # step 2: join to string
'Ӓऩ\u0d80ᇠ'

>>> "".join([chr(i) for i in unicode_codepoints]).encode("utf-8") # step 3: encode your string
b'\xd3\x92\xe0\xa4\xa9\xe0\xb6\x80\xe1\x87\xa0'

The last line is the result you are looking for.最后一行是您要查找的结果。

Answer 2

this might be self explanatory这可能是不言自明的

[ord(c) for c in ('a', 'ö', '🤗')]
>>> [97, 246, 129303]

[chr(n) for n in [97, 246, 129303]]
>>>> ['a', 'ö', '🤗']

both chr and ord are builtin functions. chr和ord都是内置函数。

如何将 unicode integer 转换为 utf-8 字符？

问题描述

2 个解决方案

解决方案1
3 已采纳 2022-05-05 19:03:02

解决方案2
0 2022-05-05 18:57:59

如何将 unicode integer 转换为 utf-8 字符？

问题描述

2 个解决方案

解决方案1 3 已采纳 2022-05-05 19:03:02

解决方案2 0 2022-05-05 18:57:59

解决方案1
3 已采纳 2022-05-05 19:03:02

解决方案2
0 2022-05-05 18:57:59