Say I have the following string: txt = "Balance: 47,124, age, ... Balance: 1,234..."
(Ellipses denote other text).
I want to use regex to find the list of balances, ie re.findall(r'Balance: (.*)', txt)
But I want to return just 47124 and 1234 instead of 47,124 and 1,234. Obviously I could replace the string afterwards, but that seems like iterating through the string twice, and thereby making this run twice as long.
I'd like to be able to output comma-less results while doing re.findall
.
Try using the following regex pattern:
Balance: (\d{1,3}(?:,\d{3})*)
This will match only a comma-separated balance amount, and will not pick up on anything else. Sample script:
txt = "Balance: 47,124, age, ... Balance: 1,234, age ... Balance: 123, age"
amounts = re.findall(r'Balance: (\d{1,3}(?:,\d{3})*)', txt)
amounts = [a.replace(',', '') for a in amounts]
print(amounts)
['47124', '1234', '123']
Here is how the regex pattern works:
\d{1,3} match an initial 1 to 3 digits
(?:,\d{3})* followed by `(,ddd)` zero or more times
So the pattern matches 1 to 999, and then allows these same values followed by one or more comma-separated thousands group.
Here's a way to do the replacements as you process each match, which might be slightly more efficient than collecting all the matches and then doing the replacements:
txt = "Balance: 47,124, age, ... Balance: 1,234 ..."
balances = [bal.group(1).replace(',', '') for bal in re.finditer(r'Balance: ([\d,]+)', txt)]
print (balances)
Output:
['47124', '1234']
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.