I need to extract the first number from a string, but I don't know the exact format of the number.
The number could be one of the following formats... 1.224
some decimal... 3,455,000
some number with unknown number of commas... 45%
a percentage ... or just an integer 5
it would be something like blah blah $ 2,400
or blah blah 45%
or blah blah $1.23
or blah blah 7
would be interesting if it was intelligent enough to do word numbers too like blah blah seven
I don't need the dollar sign, just the number
While this problem has many cases, here is a solution which solves most of them using some regex and the re
module:
import re
def extractVal(s):
return re.sub(r'^[^0-9$\-]*| .*$', '', s)
(1) It removes all leading string characters that are not 0-9, or $
(2) It removes all ending characters up to and including the first space (after (1))
Here's some data in action:
>>> data = ['blah $50,000 10', 'blah -1.224 blah', 'blah 3,455,000 blah', 'blah 45% 10 10 blah', '5 6 4']
>>> print(list(map(extractVal,data)))
['$50,000', '-1.224', '3,455,000', '45%', '5']
This solution assumes that the first number ends in a space.
We can go further as others have stated by converting these strings into numbers :
def valToInt(s):
if '%' in s:
a = float(s[:-1])/100
else:
a = float(re.sub(r'[,$]','',s))
return int(a) if a == int(a) else a
Resulting to (with the map()
function again):
[50000, -1.224, 3455000, 0.45, 5]
If you insist on a regex, then this should work (only limited to cases you mentioned):
rgx = re.compile(r'\d+(,|\.)?\d*')
assert rgx.search("blah blah $ 2,400")
assert rgx.search("blah blah 45%")
assert rgx.search("blah blah $1.23")
assert rgx.search("blah blah 7")
As for the blah blah seven
I do not thing a regex would cut it (at least not for anything more complex than a single digit).
For extracting the first number from a string, with different formats, you could use re.findall()
:
import re
strings = ['45% blah 43%', '1.224 blah 3.2', '3,455,000 blah 4,3', '$1.2 blah blah $ 2,400', '3 blah blah 7']
for string in strings:
first_match = re.findall(r'[0-9$,.%]+\d*', string)[0]
print(first_match)
Which Outputs:
45%
1.224
3,455,000
$1.2
3
Assuming you want an actual number, and that percents should be converted to a decimal:
str_ = "blah blah $ 2,400"
number, is_percent = re.search(r"([0-9,.]+)\s*(%?)", str_).groups() or (None, None)
if number is not None:
number = float(number.replace(",", ""))
if is_percent:
number /= 100
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.