简体   繁体   中英

Python convert unicode character to html code and unicode number

Here is what I ultimately want:

A dictionary that holds unicode chars as keys and html code + unicode number as list values.

Basic_Latin = {
        ...
        "@": ["U+0040", "@"],
        ...
        }

How can this be achieved if only the key is given?

I think of something like this:

Basic_Latin = {
        ...
        "@": [to_unicode(@), to_html(@)],
        ...
        }

If find a lot of methods for converting the other way round, but not for what I am looking for.

All that the notations contain is the hexadecimal and decimal value for the Unicode codepoint of the character. That value can easily be obtained by using the ord() function , then formatting the resulting integer:

codepoint = ord('@')
unicode_codepoint = 'U+{:04X}'.format(codepoint)  # four-digit uppercase hex
html_escape = '&#{:d};'.format(codepoint)         # decimal number

or as a function:

def codepoints(c):
    codepoint = ord(c)
    return ('U+{:04X}'.format(codepoint), '&#{:d};'.format(codepoint))

The function returns a tuple rather than a list; presumably this doesn't need to be mutable after all. You probably want to consider using a namedtuple class so you can also use attribute access.

Demo:

>>> def codepoints(c):
...     codepoint = ord(c)
...     return ('U+{:04X}'.format(codepoint), '&#{:d};'.format(codepoint))
...
>>> codepoints('@')
('U+0040', '@')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM