简体   繁体   English

打印 unicode 字符名称 - 例如 'GREEK SMALL LETTER ALPHA' - 而不是 'α'

[英]Printing unicode character NAMES - e.g. 'GREEK SMALL LETTER ALPHA' - instead of 'α'

I am testing function isprintable() .我正在测试函数isprintable() I want to print the Unicode NAMES of all characters in string string.whitespace + unicodedata.lookup("GREEK SMALL LETTER ALPHA") .我想打印字符串string.whitespace + unicodedata.lookup("GREEK SMALL LETTER ALPHA")中所有字符的 Unicode NAMES 。

How to print the all the names - eg 'SPACE', 'NO-BREAK SPACE', HORIZONTAL TAB, 'GREEK SMALL LETTER ALPHA.如何打印所有名称 - 例如“SPACE”、“NO-BREAK SPACE”、“水平标签”、“希腊小写字母 ALPHA”。

import unicodedata, string

for e in string.whitespace + unicodedata.lookup("GREEK SMALL LETTER ALPHA"):
    print(ord(e))
    print(unicodedata.name(e))

I get error 'ValueError: no such name'我收到错误“ValueError:没有这样的名字”

32
SPACE
9
Traceback (most recent call last):
  File "<stdin>", line 3, in <module>
ValueError: no such name

As comments indicate, the Unicode database doesn't have names for every character, but NameAliases.txt does.正如注释所示,Unicode 数据库没有为每个字符命名,但NameAliases.txt有。 Below parses that file and returns an alias if it exists.下面解析该文件并返回一个别名(如果存在)。 In this case, the first one found in the file:在这种情况下,在文件中找到的第一个:

import string
import requests
import unicodedata as ud

# Pull the official NameAliases.txt from the matching Unicode database
# the current Python was built with.
response = requests.get(f'http://www.unicode.org/Public/{ud.unidata_version}/ucd/NameAliases.txt')

# Parse NameAliases.txt, storing the first instance of a code and a name
aliases = {}
for line in response.text.splitlines():
    if not line.strip() or line.startswith('#'):
        continue
    code,name,_ = line.split(';')
    val = chr(int(code,16))
    if val not in aliases:
        aliases[val] = name

# Return the first alias from NameAliases.txt if it exists when unicodedata.name() fails.
def name(c):
    try:
        return ud.name(c)
    except ValueError:
        return aliases.get(c,'<no name>')

for e in string.whitespace + ud.lookup("GREEK SMALL LETTER ALPHA"):
    print(f'U+{ord(e):04X} {name(e)}')

Output:输出:

U+0020 SPACE
U+0009 CHARACTER TABULATION
U+000A LINE FEED
U+000D CARRIAGE RETURN
U+000B LINE TABULATION
U+000C FORM FEED
U+03B1 GREEK SMALL LETTER ALPHA

As mentioned in the in this Q&A linked by wjandrea in the comments , ASCII control characters do not have official names in the current Unicode standard, so you get a ValueError when you try to look them up.正如wjandrea评论中链接的这个问答中提到的,ASCII 控制字符在当前的 Unicode 标准中没有正式名称,因此当您尝试查找它们时会得到 ValueError。

The curses.ascii module in the standard library provides a list of two character "names" for these characters, corresponding to the name listed in the Char column in the ASCII table (as displayed by man ascii ), but without the description.标准库中的curses.ascii模块为这些字符提供了两个字符“名称”的列表,对应于 ASCII 表中 Char 列中列出的名称(由man ascii显示),但没有描述。

So we can do this所以我们可以这样做

import string
import unicodedata
from curses.ascii import controlnames

for e in (string.whitespace + "\N{GREEK SMALL LETTER ALPHA}"):
    try:
        name = unicodedata.name(e)
    except ValueError:
        name = controlnames[ord(e)]
    print(name)

giving this result给出这个结果

SPACE
HT
LF
CR
VT
FF
GREEK SMALL LETTER ALPHA

which is not ideal, but may be the best that can be done without using external resources, as done in this excellent answer.这并不理想,但可能是在不使用外部资源的情况下可以做到的最好的答案,就像在这个优秀的答案中所做的那样。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 在 python 中将异常字符转换为正常字符(例如,unicode 字符) - converts an abnormal character into a normal character (e.g., an unicode character) in python 如何在Windows cmd上将不支持的unicode字符打印为“?”而不是引发异常? - How to print unsupported unicode characters on Windows cmd as e.g. “?” instead of raising exception? 如何解决 flask 生产中的 Unicode 问题,例如 Ieeo? - How to solve Unicode problem in flask production e.g. Ieeo? 将带有unicode字符的字符串(例如→,∧,¬)转换为乳胶所示的字符串? - Convert string with unicode characters e.g. →,∧,¬ into strings illustrated in latex? Python:使用 parse_mathematica 解析例如希腊字符时出现问题 - Python: Problem with parsing e.g. Greek characters with parse_mathematica 打印到 Tkinter Window 而不是程序 output 框(例如 PyCharm) - Printing to Tkinter Window and not to the program output box (e.g. PyCharm) Python目录名称可以是关键字吗? 例如&#39;导入&#39;? - Can Python directory names be keywords? E.g. 'import'? 使用类实例名称作为图例条目,例如绘图 - Use class instance names as legend entry for e.g. a plot 返回具有动态变量名称的 function 以用于例如 lmfit - Returning a function with dynamic variable names for use in e.g. lmfit 如何在 python3 中使用非常小的浮点数(例如 8.5e-350) - How to work with very small float numbers in python3 (e.g. 8.5e-350)
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM