How to introduce extra brackets into mathematical expressions in python

Question

I'm working on a project to implement infix-to-postfix transformations in python.

The implementation of the code works as long as the expression is fully bracketed. It cannot handle expressions where humans would implicitly assume an order to the calculation.

For instance, I can use a fully-bracketed expression such as:

((3+15)*2)+(6-3)

And get the right result.

However, humans might normally write:

(3+15)*2+(6-3)

Where the first outer bracket is assumed.

Are there any algorithms that could correctly add brackets. If not, is there a best-practice solution for how to handle this sort of problem?

Update:

Here is the implementation of the parse tree function:

class BinaryTree:
   def __init__(self, root):
      self.key = root
      self.left_child = None
      self.right_child = None
   def insert_left(self, new_node):
      if self.left_child == None:
         self.left_child = BinaryTree(new_node)
      else:
         t = BinaryTree(new_node)
         t.left_child = self.left_child
         self.left_child = t
   def insert_right(self, new_node):
      if self.right_child == None:
         self.right_child = BinaryTree(new_node)
      else:
         t = BinaryTree(new_node)
         t.right_child = self.right_child
         self.right_child = t
   def get_right_child(self):
      return self.right_child
   def get_left_child(self):
      return self.left_child
   def set_root_val(self, obj):
      self.key = obj
   def get_root_val(self):
      return self.key

def build_parse_tree(fp_exp):
   fp_list = re.findall('[+-/*//()]|\d+', fp_exp)
   p_stack = Stack()
   e_tree = BinaryTree('')
   p_stack.push(e_tree)
   current_tree = e_tree
   for i in fp_list:
        if i == '(':
            current_tree.insert_left('')
            p_stack.push(current_tree)
            current_tree = current_tree.get_left_child()
        elif i not in ['+', '-', '*', '/', ')']:
            current_tree.set_root_val(int(i))
            parent = p_stack.pop()
            current_tree = parent
        elif i in ['+', '-', '*', '/']:
            current_tree.set_root_val(i)
            current_tree.insert_right('')
            p_stack.push(current_tree)
            current_tree = current_tree.get_right_child()
        elif i == ')':
            current_tree = p_stack.pop()
        else:
            raise ValueError
   return e_tree

def postorder(tree):
  if tree != None:
      postorder(tree.get_left_child())
      postorder(tree.get_right_child())
      print (tree.get_root_val())

The output from the second expression postorder:

The one with the first (correct) is:

Answer 1

Disclaimer: Sorry for that huge response; I was curious and just wrote down what I did during testing around a bit for your question. Find the entire code here: https://gist.github.com/jbndlr/3657fa890539d29c9e4b0311dc60835d

By the way, this is just test code and not meant to be used in production, as it may still be flawed.

Response

Your sequence parsing and tree setup with pushes of empty strings seems a bit odd, but I cannot accurately point to your error. Your parsing somehow swallows the * operator, probably because its left element is a closing bracket.

While I was playing around with this a bit, I tried to reproduce and came up with a solution that correctly parses simple equations and can generate the required parentheses. Even though no longer required, if the tree is already parsed correctly, you can use this to generate fully bracketed equations, or extend it by your own needs.

Preparation: The Imports

from __future__ import print_function

import re

Step 1: Tokenizing the Input

This function takes a string as an expression and generates a list of tuples representing your tokens. It also already classifies them as kind of simple (string-represented) types for later processing.

def tokenize(expression):
    '''Generate tokens from a string following fixed rules.
    '''
    scanner = re.Scanner([
        (r'[0-9]\.[0-9]+', lambda _, t: ('FLOAT', t)),
        (r'[0-9]+', lambda _, t: ('INTEGER', t)),
        (r'[a-z_]+', lambda _, t: ('IDENTIFIER', t)),
        (r'\(', lambda _, t: ('P_OPEN', t)),
        (r'\)', lambda _, t: ('P_CLOSE', t)),
        (r'[+\-*/]', lambda _, t: ('OPERATOR', t)),
        (r'\s+', None),
    ])
    tokens, _ = scanner.scan(expression)
    return tokens

This approach is by far not complete, but it is sufficient for demonstrating building binary parse trees. Note that the order of rules is important; it makes no difference here, as I do not catch single dots, but putting INTEGER before FLOAT could mess things up later.

Step 2: Parse the Hierarchy

The next function takes a list of tokens as generated in Step 1 and resolves all parts that are put into brackets as individual sub-lists. The result is a nested list where each previously bracketed part is shifted to a deeper level.

def parse(tokens, in_parens=False):
    '''Parse a list of tokens that may contain brackets into a token hierarchy
    where all brackets are removed and replaced by list nesting.
    '''
    cur = []

    i = 0
    while i < len(tokens):
        t = tokens[i]
        if t[0] == 'P_OPEN':
            # If we hit an opening bracket, we memorize its position and search
            # for the corresponding closing one by counting the stacked
            # brackets.
            pos_open = i
            pos_close = None

            par_stack = 0
            for j, p in enumerate(tokens[i:]):
                if p[0] == 'P_OPEN':
                    # Deeper nesting, increase.
                    par_stack += 1
                elif p[0] == 'P_CLOSE':
                    # Level closed, decrease.
                    par_stack -= 1
                if par_stack == 0:
                    # If we hit level 0, we found the corresponding closing
                    # bracket for the opening one.
                    pos_close = i + j
                    break

            if pos_close is None:
                # If we did not find a corresponding closing bracket, there
                # must be some syntax error.
                raise Exception('Syntax error; missing closing bracket.')

            # For the bracketed subset we just found, we invoke a recursive
            # parsing for its contents and append the result to our hierarchy.
            elem = parse(tokens[i + 1:j], in_parens=True)
            cur.append(elem)
            i = j
        elif t[0] == 'P_CLOSE':
            if not in_parens:
                # If we hit a closing bracket but are not searching for one, we
                # found too many closing brackets, which is a syntax error.
                raise Exception('Syntax error; too many closing brackets.')
            return cur
        else:
            cur.append(t)
        i += 1
    return cur

This makes sure that we do not miss the explicit grouping given by parentheses in the expression. At the same time, as we count parenthesis levels, we can spot syntax errors that result from wrong bracket counts.

Step 3: Build a Tree

In order to proceed, we need to build an actual binary tree from our hierarchy. The hierarchy we got from Step 2 still has all un-bracketed chained operators on the same level, so we do not know yet about the order in which the operators need to be executed. This is what is solved now.

When creating a new Node from a hierarchy (ie a nested list of tokens), we search for a pivot element that we can use as the operator of the currently built Node . We choose the weakest binding operator , because we build the tree top-down, but it will be evaluated bottom-up. Hence, the operation that shall be performed last is the one we want to have in the upmost Node of our tree.

class Node(object):
    def __init__(self, hierarchy, parent=None):
        if len(hierarchy) == 1 and type(hierarchy[0]) is list:
            hierarchy = hierarchy[0]  # Bracketed descent

        # Find position of operator that has the weakest binding priority and
        # use it as pivot element to split the sequence at. The weakest binding
        # is executed last, so it's the topmost node in the tree (which is
        # evaluated bottom-up).
        pivot = self._weakest_binding_position(hierarchy)

        if pivot is not None:
            self.left = Node(hierarchy[:pivot], parent=self)
            self.op = hierarchy[pivot][1]
            self.right = Node(hierarchy[pivot + 1:], parent=self)
        else:
            # There is no pivot element if there is no operator in our
            # hierarchy. If so, we hit an atomic item and this node will be
            # a leaf node.
            self.value = hierarchy[0]

    def _binding_order(self, operator):
        '''Resolve operator to its binding order.'''
        if operator in '+-':
            return 1
        elif operator in '*/':
            return 2
        raise Exception('Parsing error; operator binding cannot be assessed.')

    def _weakest_binding_position(self, tokens):
        '''Return position of operator with weakest binding (maintains LTR).'''
        ops = sorted([
            (i, self._binding_order(t[1]))
            for i, t in enumerate(tokens)
            if t[0] == 'OPERATOR'
        ], key=lambda e: e[1], reverse=True)
        if len(ops) == 0:
            if len(tokens) != 1:
                raise Exception('Parsing error; found sequence w/o operator.')
            return None
        return ops[-1][0]

    def isleaf(self):
        if hasattr(self, 'value'):
            return True
        return False

    def __str__(self):
        if self.isleaf():
            return str(self.value[1])
        else:
            return '({:s} {:s} {:s})'.format(self.left, self.op, self.right)

If you want to see how the tree is set up, just print(self) at the end of Node.__init__() . This will give you a bottom-up print of all nodes.

I added some parentheses in the Node.__str__() method to actually make a fully-bracketed expression from the input. You can verify with some samples like so:

if __name__ == '__main__':
    expressions = [
        '(3+15)*2+6-3',
        '(a+15)*2+6/3'
    ]

    for expr in expressions:
        root = Node(parse(tokenize(expr)))
        print(root)

... yields

>>> ((((3 + 15) * 2) + 6) - 3)
>>> (((a + 15) * 2) + (6 / 3))

So, if you want to print (or return) this in postfix notation now, you can just switch the operator and operands by changing this row in the Node.__str__() method:

<<<<<<<<
return '({:s} {:s} {:s})'.format(self.left, self.op, self.right)
======
return '({:s} {:s} {:s})'.format(self.left, self.right, self.op)
>>>>>>>>

If you want your postfix notation to be returned for further processing instead of just obtaining it as a string, just write another method like so (warning: pseudo-code):

def postfix(self):
    if self.isleaf():
        return self.value
    else:
        return (self.left.postfix(), self.right.postfix(), self.op)

and then invoke it from your root node:

pf = root.postfix()

Step 4: Evaluation

Finally, you can put a method into the Node class to evaluate the expression. This method checks whether or not we have a leaf node , and, if so, returns its value using the correct type. Otherwise, it evaluates its left and right child and applies the desired operator and then passes the result upwards the tree.

def eval(self, variables={}):
    if self.isleaf():
        ttype, value = self.value

        if ttype == 'FLOAT':
            return float(value)
        elif ttype == 'INTEGER':
            return int(value)
        elif ttype == 'IDENTIFIER':
            if value in variables.keys():
                return variables[value]
            else:
                raise Exception('Unbound variable: {:s}'.format(value))
        else:
            raise Exception('Unknown type: {:s}'.format(ttype))
    else:
        left = self.left.eval(variables=variables)
        right = self.right.eval(variables=variables)

        if self.op == '+':
            return left + right
        elif self.op == '-':
            return left - right
        elif self.op == '*':
            return left * right
        elif self.op == '/':
            return left / right
        else:
            raise Exception('Unknown operator: {:s}'.format(self.op))

Some special thing here is, that you can also use variables (like a in my example in Step 3 ), but you have to map them to actual (un-typed) values on evaluation:

if __name__ == '__main__':
    expression = '(a+15)*2+6/3'

    tokens = tokenize(expression)
    hierarchy = parse(tokens)
    root = Node(hierarchy)

    print(root)
    print(root.eval({'a': 7}))

... yields:

>>> (((a + 15) * 2) + (6 / 3))
>>> 46

Final Thoughts

As already stated, this is far from perfect. I even noticed, that it somehow fails to parse an expression, where a single operator connects two bracketed parts like (1-2)/(0+5) -- but I leave this to whoever wants to have a look at it ;)

Hope it helps somehow; and sorry for this huge response. I was just curious and had a little bit of spare time.

How to introduce extra brackets into mathematical expressions in python

Question

1 answers

solution1
3 ACCPTED 2018-02-01 15:35:48

Response

Preparation: The Imports

Step 1: Tokenizing the Input

Step 2: Parse the Hierarchy

Step 3: Build a Tree

Step 4: Evaluation

Final Thoughts

How to introduce extra brackets into mathematical expressions in python

Question

1 answers

solution1 3 ACCPTED 2018-02-01 15:35:48

Response

Preparation: The Imports

Step 1: Tokenizing the Input

Step 2: Parse the Hierarchy

Step 3: Build a Tree

Step 4: Evaluation

Final Thoughts

solution1
3 ACCPTED 2018-02-01 15:35:48