简体   繁体   English

将全角Unicode字符转换为ASCII字符

[英]Convert full-width Unicode characters into ASCII characters

I have some string text in unicode, containing some numbers as below: 我在Unicode中有一些字符串文本,其中包含一些数字,如下所示:

txt = '36fsdfdsf14'

However, int(txt[:2]) does not recognize the characters as number. 但是, int(txt[:2])不能将字符识别为数字。 How to change the characters to have them recognized as number? 如何更改字符以使其识别为数字?

If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement: 如果您实际上具有Unicode(或将字节字符串解码为Unicode),则可以使用规范的替换规范化数据:

>>> s = u'36fsdfdsf14'
>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'
>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'

If canonical normalization changes too much for you, you can make a translation table of just the replacements you want: 如果规范化规范化对您来说变化太大,则可以制作仅包含所需替换项的转换表:

#coding:utf8

repl = u'0123456789'

# Fullwidth digits are U+FF10 to U+FF19.
# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))

s = u'36fsdfdsf14'

print(s.translate(xlat))

Output: 输出:

36fsdfdsf14

On python 3 在python 3上

[int(x) for x in re.findall(r'\d+', '36fsdfdsf14')]
# [36, 14]

On python 2 在python 2上

[int(x) for x in re.findall(r'\d+', u'36fsdfdsf14', re.U)]
# [36, 14]

About python 2 example, notice the 'u' in front of string and re.U flag. 关于python 2示例,请注意string和re.U标志前面的'u'。 You may convert existing str typed variable such as txt in your question to unicode as txt.decode('utf8') . 您可以将问题中现有的str类型变量(例如txt转换为unicode,即txt.decode('utf8')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM