简体   繁体   中英

Extracting number from unicode string with regex

I have the following dictionary which contains some product data:

dictionary = {'price': [u'3\xa0590 EUR'],
              'name': [u'Product name with unicode chars]}

All values are in unicode. As you can see I'm using lists as dictionary values because sometimes I need to concatenate the information from several different sources.

I'm looking for a way to extract the digits from the price value without the non-breaking space (\\xa0) and currency at the end (EUR) by using a regex.

In this case I would like to see the following as a result:

3590

Can you please suggest a solution?

[SOLUTION]

Adding the solution here because the comments field wrapped my code unexpectedly:

I used .sub() method from Python's re module which is a replace function. Here is the final code that gives me the expected result:

p = re.compile( '(\xa0| EUR|)')
result = p.sub( '', dictionary['price'][0])

Not sure about python, but here's a regex:

p = /\D/g;
s.replace(p, '');

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM