I have some string text in unicode, containing some numbers as below:
txt = '36fsdfdsf14'
However, int(txt[:2])
does not recognize the characters as number. How to change the characters to have them recognized as number?
If you actually have Unicode (or decode your byte string to Unicode) then you can normalize the data with a canonical replacement:
>>> s = u'36fsdfdsf14'
>>> s
u'\uff13\uff16fsdfdsf\uff11\uff14'
>>> import unicodedata as ud
>>> ud.normalize('NFKC',s)
u'36fsdfdsf14'
If canonical normalization changes too much for you, you can make a translation table of just the replacements you want:
#coding:utf8
repl = u'0123456789'
# Fullwidth digits are U+FF10 to U+FF19.
# This makes a lookup table from Unicode ordinal to the ASCII character equivalent.
xlat = dict(zip(range(0xff10,0xff1a),repl))
s = u'36fsdfdsf14'
print(s.translate(xlat))
Output:
36fsdfdsf14
On python 3
[int(x) for x in re.findall(r'\d+', '36fsdfdsf14')]
# [36, 14]
On python 2
[int(x) for x in re.findall(r'\d+', u'36fsdfdsf14', re.U)]
# [36, 14]
About python 2 example, notice the 'u' in front of string and re.U
flag. You may convert existing str
typed variable such as txt
in your question to unicode as txt.decode('utf8')
.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.