How can I remove commas while using regex.findall?

Question

Say I have the following string: txt = "Balance: 47,124, age, ... Balance: 1,234..."

(Ellipses denote other text).

I want to use regex to find the list of balances, ie re.findall(r'Balance: (.*)', txt)

But I want to return just 47124 and 1234 instead of 47,124 and 1,234. Obviously I could replace the string afterwards, but that seems like iterating through the string twice, and thereby making this run twice as long.

I'd like to be able to output comma-less results while doing re.findall .

Answer 1

Try using the following regex pattern:

Balance: (\d{1,3}(?:,\d{3})*)

This will match only a comma-separated balance amount, and will not pick up on anything else. Sample script:

txt = "Balance: 47,124, age, ... Balance: 1,234, age ... Balance: 123, age"
amounts = re.findall(r'Balance: (\d{1,3}(?:,\d{3})*)', txt)
amounts = [a.replace(',', '') for a in amounts]
print(amounts)

['47124', '1234', '123']

Here is how the regex pattern works:

\d{1,3}      match an initial 1 to 3 digits
(?:,\d{3})*  followed by `(,ddd)` zero or more times

So the pattern matches 1 to 999, and then allows these same values followed by one or more comma-separated thousands group.

Answer 2

Here's a way to do the replacements as you process each match, which might be slightly more efficient than collecting all the matches and then doing the replacements:

txt = "Balance: 47,124, age, ... Balance: 1,234 ..."
balances = [bal.group(1).replace(',', '') for bal in re.finditer(r'Balance: ([\d,]+)', txt)]
print (balances)

Output:

['47124', '1234']

How can I remove commas while using regex.findall?

Question

2 answers

solution1
0 ACCPTED 2019-10-28 02:39:03

solution2
0 2019-10-28 02:56:34

How can I remove commas while using regex.findall?

Question

2 answers

solution1 0 ACCPTED 2019-10-28 02:39:03

solution2 0 2019-10-28 02:56:34

solution1
0 ACCPTED 2019-10-28 02:39:03

solution2
0 2019-10-28 02:56:34