简体   繁体   中英

Regex extracting negative numbers with unknown number format

I'm able to get the numbers out of this string:

string_p= 'seven 5 blah 6 decimal 6.5 thousands 8,999 with dollar signs $9,000 and $9,500,001.45 end ... lastly.... 8.4% now end

with this code:

import re

def extractVal2(s,n):
    if n > 0:
        return re.findall(r'[0-9$,.%]+\d*', s)[n-1]
    else:
        return re.findall(r'[0-9$,.%]+\d*', s)[n]


for i in range(1,7): 
    print extractVal2(string_n,i)

but I can't do negative numbers with it. The negative numbers are the ones in the parenthesis.

string_n= 'seven (5) blah (6) decimal (6.5) thousands (8,999) with dollar signs $(9,000) and $(9,500,001.45) end lastly.... (8.4)% now end'

I have tried to first replace the () with a negative sign like so

string_n= re.sub(r"\\((\\d*,?\\d*)\\)", r"-\\1", string_n)

then these to get the negative number

r'[0-9$,.%-]+\d*', s)[n]
r'[0-9$,.%]+-\d*', s)[n]
r'[-0-9$,.%]+-\d*', s)[n]

And even using a different approach:

words = string_n.split(" ")
for i in words:
    try:
        print -int(i.translate(None,"(),"))
    except:
        pass

You could change your regex to this:

import re

def extractVal2(s,n):
    try:
        pattern = r'\$?\(?[0-9][0-9,.]*\)?%?'
        if n > 0:
            return re.findall(pattern, s)[n-1].replace("(","-").replace(")","")
        else:
            return re.findall(pattern, s)[n].replace("(","-").replace(")","")
    except IndexError as e:
        return None    

string_n=  ',seven (5) blah (6) decimal (6.5) thousands (8,999) with dollar ' + \
           'signs $(9,000) and $(9,500,001.45) end lastly.... (8.4)%'

for i in range(1,9): 
    print extractVal2(string_n,i)

It would parse the 9,500,001.45 as well - and captures a leading ( after $ and before numbers and replaces it with a - sign. Its a hack though - it does not "see" if your ( is without an ) and would also capture "illegal" numbers like 2,200.200,22 .

Output:

-5
-6
-6.5
-8,999
$-9,000
$-9,500,001.45
-8.4%
None

You also should maybe think about catching the IndexError if your re.findall(..) does not capture anything (or too few ones) - and you are indexing behind the list returned.


The regex allows:

leading literal $       (not interpreded as ^...$ end of string)
optional literal (  
[0-9]                   one digit
[0-9,.%]*               any number (maybe 0 times) of the included characters in any order  
                        to the extend that it would mach smth like 000,9.34,2
optional literal )
optional literal %

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM