Python: Recursively group operands together with their operators in a list

Question

I have a list that contains symbols and operators, like this:

[['B31', '+', 'W311', '*', ['B21', '+', 'W211', '*', ['B11', '+', 'W111', '*', 'x'], '+', 'W221', '*',
                                       ['B12', '+', 'W121', '*', 'x']], '+', 'W312', '*',
             ['B22', '+', 'W212', '*', ['B11', '+', 'W111', '*', 'x'], '+', 'W222', '*',
              ['B12', '+', 'W121', '*', 'x']]]]

I wish to group operators together with their operands in lists of 3 elements, here that would be

[['B31', '+',
          ['W311', '*',
           ['B21', '+',
            [['W211', '*', [['B11', '+', ['W111', '*', 'x']]],
             '+', ['W221', '*',
                   ['B12', '+', ['W121', '*', 'x']]]]]]]],
         '+', ['W312', '*',
               ['B22', '+',
                [['W212', '*', ['B11', '+', ['W111', '*', 'x']]],
                 '+', ['W222', '*',
                       ['B12', '+', ['W121', '*', 'x']]]]]]]]

My algorithm looks like this:

def group_by_symbol(formula: Union[List, str], symbol: str) -> List:
"""
Group multiplication in formula: a op b -> [a op b]
:param formula: contains operations not inside a list.
:return: operations enclosed in a list.
"""
modified_formula = formula

# loop backwards
for i in range(len(modified_formula) - 1, -1, -1):
    if i > len(modified_formula) - 1:
        continue

    if modified_formula[i] == symbol:
        # introduce parentheses around symbol
        group = [modified_formula[i - 1], modified_formula[i], modified_formula[i + 1]]
        del modified_formula[i:i + 2]
        modified_formula[i - 1] = group
    elif isinstance(modified_formula[i], List) \
            and len(modified_formula[i]) > 3:
        # recurse
        modified_formula[i] = group_by_symbol(modified_formula[i], symbol)

return modified_formula

It is called like below:

grouped = group_by_symbol(formula, '*')
grouped = group_by_symbol(grouped, '+')

However, the case where there is more than one addition in the same list does not create the desired groups and the result I obtain is the following, where there occurs more than one + symbol in a list, and not all lists have a size of 3:

[[['B31', '+', [['W311', '*', ['B21', '+', ['W211', '*', ['B11', '+', ['W111', '*', 'x']]], '+',
                                               ['W221', '*', ['B12', '+', ['W121', '*', 'x']]]]],
                                '+', ['W312', '*',
                                      ['B22', '+', ['W212', '*', ['B11', '+', ['W111', '*', 'x']]], '+',
                                       ['W222', '*', ['B12', '+', ['W121', '*', 'x']]]]]]]]]

I suspect the error has something to do with an early exit from recursion, however, checking the sublist to contain only strings in the condition results in an endless recursion.

Answer 1

We can dramatically simplify the program by writing a pure function. The numbered comments here correspond to the the source code numbers in the program below.

If there is no operation, op , we have reached the base case. If the supplied argument, arg , is a list, convert it to an expression or simply return the arg .
By induction, there is an operation, op . If the supplied arg is a list, we need to recursively convert it, too. Return a 3-part expression with expr(*arg) , the op , and the recursive result, expr(*more)
By induction, there is an operation and the supplied arg is not a list. Return a 3-part expression with arg , the op , and the recursive result, expr(*more)

tree = \
  [['B31','+','W311','*',['B21','+','W211','*',['B11','+','W111','*','x'],'+','W221','*',['B12','+','W121','*','x']],'+','W312','*',['B22','+','W212','*',['B11','+','W111','*','x'],'+','W222','*',['B12','+','W121','*','x']]]]

def expr(arg, op = None, *more):
  if not op:
    return expr(*arg) if isinstance(arg, list) else arg #1
  elif isinstance(arg, list):
    return [ expr(*arg), op, expr(*more) ]              #2
  else:
    return [ arg, op, expr(*more) ]                     #3


print(expr(tree))
# ['B31', '+', ['W311', '*', [['B21', '+', ['W211', '*', [['B11', '+', ['W111', '*', 'x']], '+', ['W221', '*', ['B12', '+', ['W121', '*', 'x']]]]]], '+', ['W312', '*', ['B22', '+', ['W212', '*', [['B11', '+', ['W111', '*', 'x']], '+', ['W222', '*', ['B12', '+', ['W121', '*', 'x']]]]]]]]]]

Maybe we can verify the output a little better if we convert the expression to a string -

def expr_to_str(expr1, op, expr2):
  return \
  f"({expr_to_str(*expr1) if isinstance(expr1, list) else expr1} {op} {expr_to_str(*expr2) if isinstance(expr2, list) else expr2})"

print(expr_to_str(*expr(tree)))
# (B31 + (W311 * ((B21 + (W211 * ((B11 + (W111 * x)) + (W221 * (B12 + (W121 * x)))))) + (W312 * (B22 + (W212 * ((B11 + (W111 * x)) + (W222 * (B12 + (W121 * x))))))))))

Here's another way using a class -

class expr:
  def __init__(self, x, op = None, *y):
    self.op = op
    self.x = expr(*x) if isinstance(x, list) else x
    self.y = expr(*y) if y else y

  def __str__(self):
    if not self.op:
      return f"{self.x}"
    else:
      return f"({self.x} {self.op} {self.y})"

print(expr(tree))
# (B31 + (W311 * ((B21 + (W211 * ((B11 + (W111 * x)) + (W221 * (B12 + (W121 * x)))))) + (W312 * (B22 + (W212 * ((B11 + (W111 * x)) + (W222 * (B12 + (W121 * x))))))))))

varaidic support

In a comment you ask if the expr can support 3-element results and 2-element results. Here is one such flexible implementation -

In the constructor, __init__ , we do a simple case analysis -

If the input a is a list and the list is less than 4 elements, we don't need to break anything down. Simply map expr over each element of a .
By induction, the input a is a list at least 4 elements, so we need to break it into smaller expressions. Construct an expression of the first element, expr(a[0]) , the second element, expr(a[1]) , and the recursive result of all remaining elements, expr(a[2::])
By induction, the input a is not a list, ie it is a single item. Set the expression's data to the singleton, [ a ]

In the __str__ method, we do a similar analysis to convert our expression's data into a string -

When self.data is empty, return the empty string, ""
By induction, self.data is not empty. If it is less than 2 elements (singleton), return the singleton result, f"{self.data[0]}"
By induction, self.data is at least 2 or more elements. return a (...) -enclosed string where each part is recursively converted to a str and joined with a space, " "

class expr:
  def __init__(self, a):
    if isinstance(a, list):
      if len(a) < 4:
        self.data = [ expr(x) for x in a ]                   #1
      else:
        self.data = [ expr(a[0]), expr(a[1]), expr(a[2::]) ] #2
    else:
      self.data = [ a ]                                      #3

  def __str__(self):
    if not self.data:
      return ""                                              #1 empty
    elif len(self.data) < 2:
      return f"{self.data[0]}"                               #2 singleton
    else:
      return "(" + " ".join(str(x) for x in self.data) + ")" #3 variadic

print(expr(tree))
# (B31 + (W311 * ((B21 + (W211 * ((B11 + (W111 * x)) + (W221 * (B12 + (W121 * x)))))) + (W312 * (B22 + (W212 * ((B11 + (W111 * x)) + (W222 * (B12 + (W121 * x))))))))))

print(expr([[ "¬", ["a", "+", "b"]], "and", [["length", "x"], ">", 0]]))
# ((¬ (a + b)) and ((length x) > 0))

breaking it down

By decomposing a complex problem into smaller parts, it easier to solve the sub-problems and it affords us more flexibility and control. For what it's worth, this technique does not rely on Python's specific OOP mechanisms. These are ordinary, well-defined, pure functions -

def unit(): return ('unit',)
def nullary(op): return ('nullary', op)
def unary(op, a): return ('unary', op, a)
def binary(op, a, b): return ('binary', op, a, b)

Now using a flat case analysis as we've done before, we implement our recursive expression constructor expr -

if the input a is not a list, it is a single value. construct a nullary expression with a
By induction, the input is a list. If the list is empty, construct an empty result, the unit expression.
By induction, the input not an empty list. If it contains exactly one element, construct a nullary expression with the only element, expr(a[0])
By induction, the input contains at least two elements. If the input is exactly two elements, construct a unary expression with expr(a[0]) and expr(a[1])
By induction, the input contains at least three elements. If the input operator is_infix position, convert to prefix position. Construct a binary expression with expr(a[0]) and expr(a[1]) in swapped position, and the recursive result expr(a[2::])
By induction, the input contains at least three elements as is not in infix position. Construct an ordinary (prefix position) binary expression of expr(a[0]) and expr(a[1]) and the recursive result expr(a[2::])

infix_ops = set([ '+', '-', '*', '/', '>', '<', 'and', 'or' ])

def is_infix (a):
  return a[1] in infix_ops

def expr(a):
  if not isinstance(a, list):
    return nullary(a)                                    #1
  elif len(a) == 0:
      return unit()                                      #2
  elif len(a) == 1:
    return nullary(expr(a[0]))                           #3
  elif len(a) == 2:
    return unary(expr(a[0]), expr(a[1]))                 #4
  elif is_infix(a):
    return binary(expr(a[1]), expr(a[0]), expr(a[2::]))  #5
  else:
    return binary(expr(a[0]), expr(a[1]), expr(a[2::]))  #6

Now to see the result -

tree2 = \
  [[ "¬", ["a", "+", "b"]], "and", [["length", "x"], ">", 0]]

print(expr(tree2))
# ('binary', ('nullary', 'and'), ('unary', ('nullary', '¬'), ('binary', ('nullary', '+'), ('nullary', 'a'), ('nullary', ('nullary', 'b')))), ('nullary', ('binary', ('nullary', '>'), ('unary', ('nullary', 'length'), ('nullary', 'x')), ('nullary', ('nullary', 0)))))

This is just one possible representation of our expressions. Because we implemented our expressions using tuple , Python is able to print them out, despite being verbose. By contrast, here's how Python chooses to represent objects -

class foo: pass
f = foo()
print(f)
# <__main__.foo object at 0x7f2ba03bc8e0>

What's important here is that our expression data structure is well-defined and we can easily perform computations on it or represent it other ways -

def expr_to_str(m):
  if not isinstance(m, tuple):
    return str(m)
  elif m[0] == "unit":
    return ""
  elif m[0] == "nullary":
    return expr_to_str(m[1])
  elif m[0] == "unary":
    return f"({expr_to_str(m[1])} {expr_to_str(m[2])})"
  elif m[0] == "binary":
    return f"({expr_to_str(m[1])} {expr_to_str(m[2])} {expr_to_str(m[3])})"
  else:
    raise TypeError("invalid expression type", m[0])

print(expr_to_str(expr(tree2)))
# (and (¬ (+ a b)) (> (length x) 0))

evaluating an expression

So what if we wanted to evaluate one of our expressions?

m = expr([3, "+", 2, "*", 5, "-", 1])

print(expr_to_str(m))
# (+ 3 (* 2 (- 5 1)))

print(eval_expr(m))
# 11

You're just a few steps away from being able to write eval_expr -

def eval_expr(m):
  if not isinstance(m, tuple):
      return m
  elif m[0] == "unit":
    return None
  elif m[0] == "nullary":
    return eval0(m[1])
  elif m[0] == "unary":
    return eval1(m[1], m[2])
  elif m[0] == "binary":
    return eval2(m[1], m[2], m[3])
  else:
    raise TypeError("invalid expression type", m[0])

See, complex problems are easier when breaking them down into small parts. Now we just write eval0 , eval1 , and eval2 -

def eval0(op):
  return eval_expr(op)

def eval1(op, a):
  if op == expr("not"):      # or op == expr("¬") ...
    return not eval_expr(a)
  elif op == expr("neg"):    # or op == expr("~") ...
    return -eval_expr(a)
  # +, ++, --, etc...
  else:
    raise ValueError("invalid op", op)

def eval2(op, a, b):
  if op == expr("+"):
      return eval_expr(a) + eval_expr(b)
  elif op == expr("-"):
    return eval_expr(a) - eval_expr(b)
  elif op == expr("*"):
    return eval_expr(a) * eval_expr(b)
  elif op == expr("/"):
    return eval_expr(a) / eval_expr(b)
  elif op == expr("and"):
    return eval_expr(a) and eval_expr(b)
  # >, <, or, xor, etc...
  else:
    raise ValueError("invalid op", op)

Let's see a mixture of expressions now -

print(eval_expr(expr([True, 'and', ['not', False]])))
# True

print(eval_expr(expr(['neg', [9, '*', 11]])))
# -99

print(eval_expr(expr(['stay', '+', 'inside'])))
# 'stayinside'

You can even define your own functions -

def eval1(op, a):
  # ...
  elif op == expr('scream'):
    return eval_expr(a).upper() # make uppercase!
  else:
    raise ValueError("invalid op", op)

And use them in your expressions -

print(eval_expr(expr(["scream", ["stay", "+", "inside"]])))
# 'STAYINSIDE'

Python: Recursively group operands together with their operators in a list

Question

1 answers

solution1
1 ACCPTED 2020-04-13 18:25:33

Python: Recursively group operands together with their operators in a list

Question

1 answers

solution1 1 ACCPTED 2020-04-13 18:25:33

solution1
1 ACCPTED 2020-04-13 18:25:33