简体   繁体   中英

Extracting text using regex in Python

How do I extract roundUp(...) using regex (or some other derivative) from the following possible permutations:

[[[ roundUp( 10.0 ) ]]]
[[[ roundUp( 10.0 + 2.0 ) ]]]
[[[ roundUp( (10.0 * 2.0) + 2.0 ) ]]]
[[[ 10.0 + roundUp( (10.0 * 2.0) + 2.0 ) ]]]
[[[ 10.0 + roundUp( (10.0 * 2.0) + 2.0 ) + 20.0 ]]]

The reason I'm asking is I would like to replace roundUp(...) with math.ceil((...)*100)/100.0 in my code but I'm not to sure how to do it because of the chance brackets being used multiple times to help with operator precedence

This is python, why don't you just rebind the name roundUp :

def my_roundup(x):
  return math.ceil(x*100)/100.

roundUp = my_roundup

You can't solve the general case with regular expressions. Regular expressions are not powerful enough to represent anything analogous to a stack, such as parentheses or XML tags nested to arbitrary depth.

If you are solving the problem in python, you can do something like

import re

def roundup_sub(m):
    close_paren_index = None
    level = 1
    for i, c in enumerate(m.group(1)):
        if c == ')':
            level -= 1
        if level == 0:
            close_paren_index = i
            break
        if c == '(':
            level += 1
    if close_paren_index is None:
        raise ValueError("Unclosed roundUp()")
    return 'math.ceil((' + m.group(1)[1:close_paren_index] + ')*100)/100.0' + \
            m.group(1)[close_paren_index:]    # matching ')' and everything after

def replace_every_roundup(text):
    while True:
        new_text = re.sub(r'(?ms)roundUp\((.*)', roundup_sub, text)
        if new_text == text:
            return text
        text = new_text

This uses the repl=function form of re.sub, and uses a regex to find the beginning and python to match the parentheses and decide where to end the substitution.


An example of using them:

my_text = """[[[ roundUp( 10.0 ) ]]]
[[[ roundUp( 10.0 + 2.0 ) ]]]
[[[ roundUp( (10.0 * 2.0) + 2.0 ) ]]]
[[[ 10.0 + roundUp( (10.0 * 2.0) + 2.0 ) ]]]
[[[ 10.0 + roundUp( (10.0 * 2.0) + 2.0 ) + 20.0 ]]]"""
print replace_every_roundup(my_text)

which gives you the output

[[[ math.ceil((10.0 )*100)/100.0) ]]]
[[[ math.ceil((10.0 + 2.0 )*100)/100.0) ]]]
[[[ math.ceil(((10.0 * 2.0) + 2.0 )*100)/100.0) ]]]
[[[ 10.0 + math.ceil(((10.0 * 2.0) + 2.0 )*100)/100.0) ]]]
[[[ 10.0 + math.ceil(((10.0 * 2.0) + 2.0 )*100)/100.0) + 20.0 ]]]

Another option would be to implement a regex that handles up to a certain depth of nested parentheses.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM