简体   繁体   中英

Printing unicode character NAMES - e.g. 'GREEK SMALL LETTER ALPHA' - instead of 'α'

I am testing function isprintable() . I want to print the Unicode NAMES of all characters in string string.whitespace + unicodedata.lookup("GREEK SMALL LETTER ALPHA") .

How to print the all the names - eg 'SPACE', 'NO-BREAK SPACE', HORIZONTAL TAB, 'GREEK SMALL LETTER ALPHA.

import unicodedata, string

for e in string.whitespace + unicodedata.lookup("GREEK SMALL LETTER ALPHA"):
    print(ord(e))
    print(unicodedata.name(e))

I get error 'ValueError: no such name'

32
SPACE
9
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
ValueError: no such name

As comments indicate, the Unicode database doesn't have names for every character, but NameAliases.txt does. Below parses that file and returns an alias if it exists. In this case, the first one found in the file:

import string
import requests
import unicodedata as ud

# Pull the official NameAliases.txt from the matching Unicode database
# the current Python was built with.
response = requests.get(f'http://www.unicode.org/Public/{ud.unidata_version}/ucd/NameAliases.txt')

# Parse NameAliases.txt, storing the first instance of a code and a name
aliases = {}
for line in response.text.splitlines():
    if not line.strip() or line.startswith('#'):
        continue
    code,name,_ = line.split(';')
    val = chr(int(code,16))
    if val not in aliases:
        aliases[val] = name

# Return the first alias from NameAliases.txt if it exists when unicodedata.name() fails.
def name(c):
    try:
        return ud.name(c)
    except ValueError:
        return aliases.get(c,'<no name>')

for e in string.whitespace + ud.lookup("GREEK SMALL LETTER ALPHA"):
    print(f'U+{ord(e):04X} {name(e)}')

Output:

U+0020 SPACE
U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000D CARRIAGE RETURN
U+000B LINE TABULATION
U+000C FORM FEED
U+03B1 GREEK SMALL LETTER ALPHA

As mentioned in the in this Q&A linked by wjandrea in the comments , ASCII control characters do not have official names in the current Unicode standard, so you get a ValueError when you try to look them up.

The curses.ascii module in the standard library provides a list of two character "names" for these characters, corresponding to the name listed in the Char column in the ASCII table (as displayed by man ascii ), but without the description.

So we can do this

import string
import unicodedata
from curses.ascii import controlnames

for e in (string.whitespace + "\N{GREEK SMALL LETTER ALPHA}"):
    try:
        name = unicodedata.name(e)
    except ValueError:
        name = controlnames[ord(e)]
    print(name)

giving this result

SPACE
HT
LF
CR
VT
FF
GREEK SMALL LETTER ALPHA

which is not ideal, but may be the best that can be done without using external resources, as done in this excellent answer.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM