简体   繁体   中英

Python regex to extract positive and negative numbers between two special characters

Need to extract value from a string, the value can contain a comma, a decimal point, both comma and decimal point, without any of comma or decimal point, with any of comma or decimal.

For example:

1,921.15
921.15
921
1,921

re.findall(r'[-+]?\d+[,.]?\d*',st)[3]" its extracting only 1,921 but not as 1,921.15


st='["FL gr_20 PT10 MT3\'><strong>1,921.15</strong>"]'

I have tried re.findall(r'[-+]?\d+[,.]?\d*',st)[3] its extracting only 1,921 but not as 1,921.15

From below string st, using re module, I need to extract the value 1,921.15

st='["FL gr_20 PT10 MT3\'><strong>1,921.15</strong>"]'

Expected = 1,921.15
Actual = 1,921

In general , when you want to extract positive or negative integer or float numbers from text using Python regex, you can use the following pattern

re.findall(r'[-+]?(?:\d{1,3}(?:,\d{3})+|\d+)(?:\.\d+)?', text)

See this regex demo . Note : \d{1,3}(?:,\d{3})+ alternative matches integer numbers with comma as a thousand separator. You may adjust it to match the thousand separator you need, say, \xA0 if the thousand separator is a non-breaking space, or \. if it is a dot, etc.

Some more options will look like

re.findall(r'[-+]?\d+(?:\.\d+)?', text) # Integer part is compulsory, e.g. 5.55
re.findall(r'[-+]?\d*\.?\d+', text)     # Also matches .57 or -.76

Here , you want to extract any number in between > and < chars.

You may use

re.findall(r'>([-+]?\d[\d,.]*)<', text)

See the regex demo .

Details

  • > - a > char
  • ([-+]?\d[\d,.]*) - Group 1:
    • [-+]? - an optional - or +
    • \d - a digit
    • [\d,.]* - 0 or more digits, , or .

See the Python demo :

import re
st='''["FL gr_20 T3\'><strong>+1,921.15</strong>"]' st='["FL gr_20 T3\'><strong>-921.15</strong>"]' st='["FL gr_20 T3\'><strong>21.15</strong>"]' st='["FL gr_20 T3\'><strong>1,11,921.15</strong>"]' st='["FL gr_20 T3\'><strong>1,921</strong>"]' st='["FL gr_20 T3\'><strong>112921</strong>"]' st='["FL gr_20 T3\'><strong>1.15</strong>"]' st='["FL gr_20 T3\'><strong>1</strong>"]'''
print(re.findall(r'>([-+]?\d[\d,.]*)<', st))
# => ['+1,921.15', '-921.15', '21.15', '1,11,921.15', '1,921', '112921', '1.15', '1']

Your regexp doesnt take into account when a number has ',' and '.' You could use the below regexp to match all cases:

re.findall(r'[-+]?\d+(?:,\d+)?(?:\.\d+)?'

Just substitute out the commas and cast to a float:

In [1]: l = ['1,921.15', '921.15', '921', '1,921']
   ...:

In [2]: l
Out[2]: ['1,921.15', '921.15', '921', '1,921']

In [3]: [float(x.replace(',','')) for x in l]
Out[3]: [1921.15, 921.15, 921.0, 1921.0]

If you really want to get rid of .0 s, use is_integer() to cast only whole numbers:

In [4]: [int(f) if f.is_integer() else f for f in [float(x.replace(',','')) for x in l]]
Out[4]: [1921.15, 921.15, 921, 1921]

It looks like your trying to capture all of any valid number format so this would work:

[+-]?\d+(?:,\d{3})*(\.\d+)*

https://regex101.com/r/5bygVO/1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM