Python将unicode字符转换为html代码和unicode号

Question

Here is what I ultimately want: 这是我最终想要的：

A dictionary that holds unicode chars as keys and html code + unicode number as list values. 字典，其中包含Unicode字符作为键，而html代码+ Unicode号作为列表值。

Basic_Latin = {
        ...
        "@": ["U+0040", "&#64;"],
        ...
        }

How can this be achieved if only the key is given? 如果仅给出密钥，如何实现？

I think of something like this: 我想到这样的事情：

Basic_Latin = {
        ...
        "@": [to_unicode(@), to_html(@)],
        ...
        }

If find a lot of methods for converting the other way round, but not for what I am looking for. 如果找到了很多方法来进行相反的转换，但没有找到我想要的方法。

Answer 1

All that the notations contain is the hexadecimal and decimal value for the Unicode codepoint of the character. 这些符号所包含的只是字符的Unicode代码点的十六进制和十进制值。 That value can easily be obtained by using the ord() function , then formatting the resulting integer: 通过使用ord()函数，然后格式化结果整数，可以轻松获得该值：

codepoint = ord('@')
unicode_codepoint = 'U+{:04X}'.format(codepoint)  # four-digit uppercase hex
html_escape = '&#{:d};'.format(codepoint)         # decimal number

or as a function: 或作为功能：

def codepoints(c):
    codepoint = ord(c)
    return ('U+{:04X}'.format(codepoint), '&#{:d};'.format(codepoint))

The function returns a tuple rather than a list; 该函数返回一个元组而不是一个列表。 presumably this doesn't need to be mutable after all. 大概这根本不需要是可变的。 You probably want to consider using a namedtuple class so you can also use attribute access. 您可能要考虑使用namedtuple类，以便也可以使用属性访问。

Demo: 演示：

>>> def codepoints(c):
...     codepoint = ord(c)
...     return ('U+{:04X}'.format(codepoint), '&#{:d};'.format(codepoint))
...
>>> codepoints('@')
('U+0040', '&#64;')

Python将unicode字符转换为html代码和unicode号

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-04-28 09:18:46

Python将unicode字符转换为html代码和unicode号

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-04-28 09:18:46

解决方案1
1 已采纳 2017-04-28 09:18:46