简体   繁体   中英

Parsing a block of mathematical expressions and separate the terms

In a textfile, I have a block of text between 2 keywords (let's call them "keyword1" and "keyword2") which consists in a big mathematical expression which is a sum of smaller expressions and could be more or less complex. x"random_number" refer to some variables which are numbered. For example, this could be like this:

keyword1 x47*ln(x46+2*x38) + (x35*x24 + exp(x87 + x56))^2 - x34 + ...
+ .....
+ .....

keyword2

All I want to do is to separate this big mathematical expression in the terms it is coumpound with and stock these "atomic" terms in a list for example so that every term which appear in the sum (if it is negative, this should be - term)

With the example above, this should return this:

L = [x47*ln(x46+2*x38), (x35*x24 + exp(x87 + x56))^2, - x34, ...]

I would try to use a regex which matches with the + or - symbol which separates terms between them but I think this is wrong because it will also match the +/- symbols which appears in smaller expressions which I don't want to be separated

So I'm a bit triggered with this

Thank you in advance for helping me solve my problem guys

I think for extracting the part between the keywords, a regex will work just fine. With the help of an online regex creator you should be able to create that. Then you have the string left with the mathematical formula in it.

Essentially what you want is to split the string at all places where the bracket 'depth' is 0. For example, if you have x1*(x2+x3)+x4 the + between the brackets should be ignored.

I wrote the following function which searches though the list and keeps track of the current bracket depth. If the depth is 0 and a + or - is encountered, the index is stored. In the end, we can split the string at these indices to obtain the split you require. I first wrote a recursive variant, but the iterative variant works just as well and is probably easier to understand.

Recursive function

def find_split_indexes(block, index=0, depth=0, indexes=[]):
    # return when the string has been searched entirely
    if index >= len(block):
        return indexes
    
    # change the depth when a bracket is encountered
    if block[index] == '(':
        depth += 1
    elif block[index] == ')':
        depth -= 1
    
    # if a + or minus is encountered at depth 0, store the index
    if depth == 0 and (block[index] == '+' or block[index] == '-'):
        indexes.append(index)
    
    # finally return the list of indexes
    return find_split_indexes(block, index+1, depth, indexes)

Iterative function

Of course an iterative (using a loop) version of this function can also be created, and is likely a bit simpler to understand

def find_split_indexes_iterative(block):
    indexes = []
    depth = 0
    
    # iterate over the string
    for index in range(len(block)):
        if block[index] == '(':
            depth += 1
        elif block[index] == ')':
            depth -= 1
        elif depth == 0 and (block[index] == '+' or block[index] == '-'):
            indexes.append(index)
    return indexes

Using the indices

To then use these indices, you can, for instance, split the string as explained in this other question to obtain the parts you want. The only thing left to do is remove the leading and trailing spaces.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM