将全角Unicode字符转换为ASCII字符

Question

I have some string text in unicode, containing some numbers as below: 我在Unicode中有一些字符串文本，其中包含一些数字，如下所示：

txt = '３６fsdfdsf１４'

However, int(txt[:2]) does not recognize the characters as number. 但是， int(txt[:2])不能将字符识别为数字。 How to change the characters to have them recognized as number? 如何更改字符以使其识别为数字？

Answer 1

If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement: 如果您实际上具有Unicode（或将字节字符串解码为Unicode），则可以使用规范的替换规范化数据：

>>> s = u'３６fsdfdsf１４'
>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'
>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'

If canonical normalization changes too much for you, you can make a translation table of just the replacements you want: 如果规范化规范化对您来说变化太大，则可以制作仅包含所需替换项的转换表：

#coding:utf8

repl = u'0123456789'

# Fullwidth digits are U+FF10 to U+FF19.
# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))

s = u'３６fsdfdsf１４'

print(s.translate(xlat))

Output: 输出：

36fsdfdsf14

Answer 2

On python 3 在python 3上

[int(x) for x in re.findall(r'\d+', '３６fsdfdsf１４')]
# [36, 14]

On python 2 在python 2上

[int(x) for x in re.findall(r'\d+', u'３６fsdfdsf１４', re.U)]
# [36, 14]

About python 2 example, notice the 'u' in front of string and re.U flag. 关于python 2示例，请注意string和re.U标志前面的'u'。 You may convert existing str typed variable such as txt in your question to unicode as txt.decode('utf8') . 您可以将问题中现有的str类型变量（例如txt转换为unicode，即txt.decode('utf8') 。

将全角Unicode字符转换为ASCII字符

问题描述

2 个解决方案

解决方案1
2 2018-06-08 09:25:05

解决方案2
0 2018-06-08 08:02:54

将全角Unicode字符转换为ASCII字符

问题描述

2 个解决方案

解决方案1 2 2018-06-08 09:25:05

解决方案2 0 2018-06-08 08:02:54

解决方案1
2 2018-06-08 09:25:05

解决方案2
0 2018-06-08 08:02:54