简体   繁体   English

Python将unicode字符转换为html代码和unicode号

[英]Python convert unicode character to html code and unicode number

Here is what I ultimately want: 这是我最终想要的:

A dictionary that holds unicode chars as keys and html code + unicode number as list values. 字典,其中包含Unicode字符作为键,而html代码+ Unicode号作为列表值。

Basic_Latin = {
        ...
        "@": ["U+0040", "@"],
        ...
        }

How can this be achieved if only the key is given? 如果仅给出密钥,如何实现?

I think of something like this: 我想到这样的事情:

Basic_Latin = {
        ...
        "@": [to_unicode(@), to_html(@)],
        ...
        }

If find a lot of methods for converting the other way round, but not for what I am looking for. 如果找到了很多方法来进行相反的转换,但没有找到我想要的方法。

All that the notations contain is the hexadecimal and decimal value for the Unicode codepoint of the character. 这些符号所包含的只是字符的Unicode代码点的十六进制和十进制值。 That value can easily be obtained by using the ord() function , then formatting the resulting integer: 通过使用ord()函数 ,然后格式化结果整数,可以轻松获得该值:

codepoint = ord('@')
unicode_codepoint = 'U+{:04X}'.format(codepoint)  # four-digit uppercase hex
html_escape = '&#{:d};'.format(codepoint)         # decimal number

or as a function: 或作为功能:

def codepoints(c):
    codepoint = ord(c)
    return ('U+{:04X}'.format(codepoint), '&#{:d};'.format(codepoint))

The function returns a tuple rather than a list; 该函数返回一个元组而不是一个列表。 presumably this doesn't need to be mutable after all. 大概这根本不需要是可变的。 You probably want to consider using a namedtuple class so you can also use attribute access. 您可能要考虑使用namedtuple类,以便也可以使用属性访问。

Demo: 演示:

>>> def codepoints(c):
...     codepoint = ord(c)
...     return ('U+{:04X}'.format(codepoint), '&#{:d};'.format(codepoint))
...
>>> codepoints('@')
('U+0040', '@')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM