简体   繁体   中英

Regex to match scientific notation

I'm trying to match numbers in scientific notation (regex from here ):

scinot = re.compile('[+\-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)')
re.findall(scinot, 'x = 1e4')
['1e4']
re.findall(scinot, 'x = c1e4')
['1e4']

I'd like it to match x = 1e4 but not x = c1e4 . What should I change?

Update : The answer here has the same problem: it incorrectly matches 'x = c1e4' .

在正则表达式的末尾添加锚,并在数字前加空格或等号:

[\s=]+([+-]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+))$

Simply add [^\\w]? to exclude all alphanumeric characters that precede your first digit:

 [+\-]?[^\w]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)

Technically, the \\w will also exlude numeric characters, but that's fine because the rest of your regex will catch it.

If you want to be truly rigorous, you can replace \\w with A-Za-z :

 [+\-]?[^A-Za-z]?(?:0|[1-9]\d*)(?:\.\d*)?(?:[eE][+\-]?\d+)

Another sneaky way is to simply add a space at the beginning of your regex - that will force all your matches to have to begin with whitespace.

scinot = re.compile('[-+]?[\\d]+\\.?[\\d]*[Ee](?:[-+]?[\\d]+)?')

This regex would help you to find all the scientific notation in the text.

By the way, here is the link to the similar question: Extract scientific number from string

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM