简体   繁体   English

如何将 unicode integer 转换为 utf-8 字符?

[英]How to convert a unicode integer to utf-8 character?

I have a set of unicode codepoints stored as integers, and I'd like to encode these as UTF-8. If I understand correctly, UTF-8 is just an encoding for integers (the fact that it's used for unicode in particular isn't fundamental to UTF-8), so this should just be a matter of encoding an integer in the UTF-8 encoding.我有一组 unicode 代码点存储为整数,我想将它们编码为 UTF-8。如果我理解正确,UTF-8 只是整数的编码(它特别用于 unicode 的事实并不是 UTF 的基础-8),所以这应该只是在 UTF-8 编码中编码 integer 的问题。 Is there a standard utility for doing this, and if not, is there an easy way of doing it manually?是否有执行此操作的标准实用程序,如果没有,是否有手动执行此操作的简单方法?

There is an easy way.有一个简单的方法。 If you are on windows you might run into problems if you surpass UTF-16LE characters due to platform limitations.如果您使用的是 windows,如果由于平台限制超过 UTF-16LE 字符,您可能会遇到问题。 On linux you should be safe with full unicode.在 linux 上,完整的 unicode 应该是安全的。

>>> my_unicode_codepoints = [1234, 2345, 3456, 4576] # example codepoints

>>> [chr(i) for i in unicode_codepoints] # step 1: use python automagic for casting to wide enough chars
['Ӓ', 'ऩ', '\u0d80', 'ᇠ']

>>> "".join([chr(i) for i in unicode_codepoints]) # step 2: join to string
'Ӓऩ\u0d80ᇠ'

>>> "".join([chr(i) for i in unicode_codepoints]).encode("utf-8") # step 3: encode your string
b'\xd3\x92\xe0\xa4\xa9\xe0\xb6\x80\xe1\x87\xa0'

The last line is the result you are looking for.最后一行是您要查找的结果。

this might be self explanatory这可能是不言自明的

[ord(c) for c in ('a', 'ö', '🤗')]
>>> [97, 246, 129303]

[chr(n) for n in [97, 246, 129303]]
>>>> ['a', 'ö', '🤗']

both chr and ord are builtin functions. chrord都是内置函数。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM