Regex returns negative integers as positive

Question

I am scrapping a web and extracting some values, from which I need only the numeric half. For example, if the string says "-14.32 kcal/mole",I want to get the float -14.32

To do this I am applying the following code:

import re

number_string = '-9.2 kcal/mole'


number = re.search(r"[-+]?\d*\.\d+|\d+", number_string).group()

print(number)

Output: -9.2

Whenever the number_string is a float it works fine. But when the number is a negative integer, I get the postive value of that number.

For example,

import re

number_string = '-4 kcal/mole'


number = re.search(r"[-+]?\d*\.\d+|\d+", number_string).group()

print(number)

Output: 4 (instead of -4)

Answer 1

| is the lowest priority operator. You are looking for a non-zero float

[-+]?\d*\.\d+

or an unsigned integer

\d+

You need to parenthesize the expression for matching the absolute value to make the sign apply to either:

[-+]?(?:\d*\.\d+|\d+)

or make the fractional part optional.

[-+]?\d*(?:.\d+)?

In both cases, I've used non-capture groups to avoid changing the semantics of the following call to the groups method.

Answer 2

I would use something like this:

[+-]?(?:\d*\.)?\d+

[+-]? - optional positive or negative sign
(?:\d*\.)? - optional leading digits followed by decimal
\d+ - required digits

https://regex101.com/r/WKPQ4h/1

Since you are scraping web content this regex will simply find all numbers.

You will probably wish to target specific units of measurement:

[+-]?(?:\d*\.)?\d+(?= (?:kcal/mole|butterflies))

https://regex101.com/r/FM5ZXJ/1

Answer 3

Your regular expression is set up to search for [-+]?\d*\.\d+ or \d+ , that is why it is happening. You can change you regular expression to something like [-+]?\d*\.\d+|[-+]?\d+ and that should get your expected result.

Regex returns negative integers as positive

Question

3 answers

solution1
1 2022-01-11 15:50:44

solution2
1 2022-01-11 15:51:01

solution3
0 2022-01-11 15:47:39

Regex returns negative integers as positive

Question

3 answers

solution1 1 2022-01-11 15:50:44

solution2 1 2022-01-11 15:51:01

solution3 0 2022-01-11 15:47:39

solution1
1 2022-01-11 15:50:44

solution2
1 2022-01-11 15:51:01

solution3
0 2022-01-11 15:47:39