I am scrapping a web and extracting some values, from which I need only the numeric half. For example, if the string says "-14.32 kcal/mole",I want to get the float -14.32
To do this I am applying the following code:
import re
number_string = '-9.2 kcal/mole'
number = re.search(r"[-+]?\d*\.\d+|\d+", number_string).group()
print(number)
Output: -9.2
Whenever the number_string is a float it works fine. But when the number is a negative integer, I get the postive value of that number.
For example,
import re
number_string = '-4 kcal/mole'
number = re.search(r"[-+]?\d*\.\d+|\d+", number_string).group()
print(number)
Output: 4 (instead of -4)
|
is the lowest priority operator. You are looking for a non-zero float
[-+]?\d*\.\d+
or an unsigned integer
\d+
You need to parenthesize the expression for matching the absolute value to make the sign apply to either:
[-+]?(?:\d*\.\d+|\d+)
or make the fractional part optional.
[-+]?\d*(?:.\d+)?
In both cases, I've used non-capture groups to avoid changing the semantics of the following call to the groups
method.
I would use something like this:
[+-]?(?:\d*\.)?\d+
[+-]?
- optional positive or negative sign (?:\d*\.)?
- optional leading digits followed by decimal \d+
- required digits https://regex101.com/r/WKPQ4h/1
Since you are scraping web content this regex will simply find all numbers.
You will probably wish to target specific units of measurement:
[+-]?(?:\d*\.)?\d+(?= (?:kcal/mole|butterflies))
Your regular expression is set up to search for [-+]?\d*\.\d+
or \d+
, that is why it is happening. You can change you regular expression to something like [-+]?\d*\.\d+|[-+]?\d+
and that should get your expected result.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.