[英]Python convert unicode character to html code and unicode number
Here is what I ultimately want: 这是我最终想要的:
A dictionary that holds unicode chars as keys and html code + unicode number as list values. 字典,其中包含Unicode字符作为键,而html代码+ Unicode号作为列表值。
Basic_Latin = {
...
"@": ["U+0040", "@"],
...
}
How can this be achieved if only the key is given? 如果仅给出密钥,如何实现?
I think of something like this: 我想到这样的事情:
Basic_Latin = {
...
"@": [to_unicode(@), to_html(@)],
...
}
If find a lot of methods for converting the other way round, but not for what I am looking for. 如果找到了很多方法来进行相反的转换,但没有找到我想要的方法。
All that the notations contain is the hexadecimal and decimal value for the Unicode codepoint of the character. 这些符号所包含的只是字符的Unicode代码点的十六进制和十进制值。 That value can easily be obtained by using the
ord()
function , then formatting the resulting integer: 通过使用
ord()
函数 ,然后格式化结果整数,可以轻松获得该值:
codepoint = ord('@')
unicode_codepoint = 'U+{:04X}'.format(codepoint) # four-digit uppercase hex
html_escape = '&#{:d};'.format(codepoint) # decimal number
or as a function: 或作为功能:
def codepoints(c):
codepoint = ord(c)
return ('U+{:04X}'.format(codepoint), '&#{:d};'.format(codepoint))
The function returns a tuple rather than a list; 该函数返回一个元组而不是一个列表。 presumably this doesn't need to be mutable after all.
大概这根本不需要是可变的。 You probably want to consider using a
namedtuple
class so you can also use attribute access. 您可能要考虑使用
namedtuple
类,以便也可以使用属性访问。
Demo: 演示:
>>> def codepoints(c):
... codepoint = ord(c)
... return ('U+{:04X}'.format(codepoint), '&#{:d};'.format(codepoint))
...
>>> codepoints('@')
('U+0040', '@')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.