简体   繁体   中英

Regex returns negative integers as positive

I am scrapping a web and extracting some values, from which I need only the numeric half. For example, if the string says "-14.32 kcal/mole",I want to get the float -14.32

To do this I am applying the following code:

import re

number_string = '-9.2 kcal/mole'


number = re.search(r"[-+]?\d*\.\d+|\d+", number_string).group()

print(number)

Output: -9.2

Whenever the number_string is a float it works fine. But when the number is a negative integer, I get the postive value of that number.

For example,

import re

number_string = '-4 kcal/mole'


number = re.search(r"[-+]?\d*\.\d+|\d+", number_string).group()

print(number)

Output: 4 (instead of -4)

| is the lowest priority operator. You are looking for a non-zero float

[-+]?\d*\.\d+

or an unsigned integer

\d+

You need to parenthesize the expression for matching the absolute value to make the sign apply to either:

[-+]?(?:\d*\.\d+|\d+)

or make the fractional part optional.

[-+]?\d*(?:.\d+)?

In both cases, I've used non-capture groups to avoid changing the semantics of the following call to the groups method.

I would use something like this:

[+-]?(?:\d*\.)?\d+
  • [+-]? - optional positive or negative sign
  • (?:\d*\.)? - optional leading digits followed by decimal
  • \d+ - required digits

https://regex101.com/r/WKPQ4h/1


Since you are scraping web content this regex will simply find all numbers.

You will probably wish to target specific units of measurement:

[+-]?(?:\d*\.)?\d+(?= (?:kcal/mole|butterflies))

https://regex101.com/r/FM5ZXJ/1

Your regular expression is set up to search for [-+]?\d*\.\d+ or \d+ , that is why it is happening. You can change you regular expression to something like [-+]?\d*\.\d+|[-+]?\d+ and that should get your expected result.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM