简体   繁体   中英

Python: Recursively group operands together with their operators in a list

I have a list that contains symbols and operators, like this:

[['B31', '+', 'W311', '*', ['B21', '+', 'W211', '*', ['B11', '+', 'W111', '*', 'x'], '+', 'W221', '*',
                                       ['B12', '+', 'W121', '*', 'x']], '+', 'W312', '*',
             ['B22', '+', 'W212', '*', ['B11', '+', 'W111', '*', 'x'], '+', 'W222', '*',
              ['B12', '+', 'W121', '*', 'x']]]]

I wish to group operators together with their operands in lists of 3 elements, here that would be

[['B31', '+',
          ['W311', '*',
           ['B21', '+',
            [['W211', '*', [['B11', '+', ['W111', '*', 'x']]],
             '+', ['W221', '*',
                   ['B12', '+', ['W121', '*', 'x']]]]]]]],
         '+', ['W312', '*',
               ['B22', '+',
                [['W212', '*', ['B11', '+', ['W111', '*', 'x']]],
                 '+', ['W222', '*',
                       ['B12', '+', ['W121', '*', 'x']]]]]]]]

My algorithm looks like this:

def group_by_symbol(formula: Union[List, str], symbol: str) -> List:
"""
Group multiplication in formula: a op b -> [a op b]
:param formula: contains operations not inside a list.
:return: operations enclosed in a list.
"""
modified_formula = formula

# loop backwards
for i in range(len(modified_formula) - 1, -1, -1):
    if i > len(modified_formula) - 1:
        continue

    if modified_formula[i] == symbol:
        # introduce parentheses around symbol
        group = [modified_formula[i - 1], modified_formula[i], modified_formula[i + 1]]
        del modified_formula[i:i + 2]
        modified_formula[i - 1] = group
    elif isinstance(modified_formula[i], List) \
            and len(modified_formula[i]) > 3:
        # recurse
        modified_formula[i] = group_by_symbol(modified_formula[i], symbol)

return modified_formula

It is called like below:

grouped = group_by_symbol(formula, '*')
grouped = group_by_symbol(grouped, '+')

However, the case where there is more than one addition in the same list does not create the desired groups and the result I obtain is the following, where there occurs more than one + symbol in a list, and not all lists have a size of 3:

[[['B31', '+', [['W311', '*', ['B21', '+', ['W211', '*', ['B11', '+', ['W111', '*', 'x']]], '+',
                                               ['W221', '*', ['B12', '+', ['W121', '*', 'x']]]]],
                                '+', ['W312', '*',
                                      ['B22', '+', ['W212', '*', ['B11', '+', ['W111', '*', 'x']]], '+',
                                       ['W222', '*', ['B12', '+', ['W121', '*', 'x']]]]]]]]]

I suspect the error has something to do with an early exit from recursion, however, checking the sublist to contain only strings in the condition results in an endless recursion.

We can dramatically simplify the program by writing a pure function. The numbered comments here correspond to the the source code numbers in the program below.

  1. If there is no operation, op , we have reached the base case. If the supplied argument, arg , is a list, convert it to an expression or simply return the arg .
  2. By induction, there is an operation, op . If the supplied arg is a list, we need to recursively convert it, too. Return a 3-part expression with expr(*arg) , the op , and the recursive result, expr(*more)
  3. By induction, there is an operation and the supplied arg is not a list. Return a 3-part expression with arg , the op , and the recursive result, expr(*more)
tree = \
  [['B31','+','W311','*',['B21','+','W211','*',['B11','+','W111','*','x'],'+','W221','*',['B12','+','W121','*','x']],'+','W312','*',['B22','+','W212','*',['B11','+','W111','*','x'],'+','W222','*',['B12','+','W121','*','x']]]]

def expr(arg, op = None, *more):
  if not op:
    return expr(*arg) if isinstance(arg, list) else arg #1
  elif isinstance(arg, list):
    return [ expr(*arg), op, expr(*more) ]              #2
  else:
    return [ arg, op, expr(*more) ]                     #3


print(expr(tree))
# ['B31', '+', ['W311', '*', [['B21', '+', ['W211', '*', [['B11', '+', ['W111', '*', 'x']], '+', ['W221', '*', ['B12', '+', ['W121', '*', 'x']]]]]], '+', ['W312', '*', ['B22', '+', ['W212', '*', [['B11', '+', ['W111', '*', 'x']], '+', ['W222', '*', ['B12', '+', ['W121', '*', 'x']]]]]]]]]]

Maybe we can verify the output a little better if we convert the expression to a string -

def expr_to_str(expr1, op, expr2):
  return \
  f"({expr_to_str(*expr1) if isinstance(expr1, list) else expr1} {op} {expr_to_str(*expr2) if isinstance(expr2, list) else expr2})"

print(expr_to_str(*expr(tree)))
# (B31 + (W311 * ((B21 + (W211 * ((B11 + (W111 * x)) + (W221 * (B12 + (W121 * x)))))) + (W312 * (B22 + (W212 * ((B11 + (W111 * x)) + (W222 * (B12 + (W121 * x))))))))))

Here's another way using a class -

class expr:
  def __init__(self, x, op = None, *y):
    self.op = op
    self.x = expr(*x) if isinstance(x, list) else x
    self.y = expr(*y) if y else y

  def __str__(self):
    if not self.op:
      return f"{self.x}"
    else:
      return f"({self.x} {self.op} {self.y})"

print(expr(tree))
# (B31 + (W311 * ((B21 + (W211 * ((B11 + (W111 * x)) + (W221 * (B12 + (W121 * x)))))) + (W312 * (B22 + (W212 * ((B11 + (W111 * x)) + (W222 * (B12 + (W121 * x))))))))))

varaidic support

In a comment you ask if the expr can support 3-element results and 2-element results. Here is one such flexible implementation -

In the constructor, __init__ , we do a simple case analysis -

  1. If the input a is a list and the list is less than 4 elements, we don't need to break anything down. Simply map expr over each element of a .
  2. By induction, the input a is a list at least 4 elements, so we need to break it into smaller expressions. Construct an expression of the first element, expr(a[0]) , the second element, expr(a[1]) , and the recursive result of all remaining elements, expr(a[2::])
  3. By induction, the input a is not a list, ie it is a single item. Set the expression's data to the singleton, [ a ]

In the __str__ method, we do a similar analysis to convert our expression's data into a string -

  1. When self.data is empty, return the empty string, ""
  2. By induction, self.data is not empty. If it is less than 2 elements (singleton), return the singleton result, f"{self.data[0]}"
  3. By induction, self.data is at least 2 or more elements. return a (...) -enclosed string where each part is recursively converted to a str and joined with a space, " "
class expr:
  def __init__(self, a):
    if isinstance(a, list):
      if len(a) < 4:
        self.data = [ expr(x) for x in a ]                   #1
      else:
        self.data = [ expr(a[0]), expr(a[1]), expr(a[2::]) ] #2
    else:
      self.data = [ a ]                                      #3

  def __str__(self):
    if not self.data:
      return ""                                              #1 empty
    elif len(self.data) < 2:
      return f"{self.data[0]}"                               #2 singleton
    else:
      return "(" + " ".join(str(x) for x in self.data) + ")" #3 variadic

print(expr(tree))
# (B31 + (W311 * ((B21 + (W211 * ((B11 + (W111 * x)) + (W221 * (B12 + (W121 * x)))))) + (W312 * (B22 + (W212 * ((B11 + (W111 * x)) + (W222 * (B12 + (W121 * x))))))))))

print(expr([[ "¬", ["a", "+", "b"]], "and", [["length", "x"], ">", 0]]))
# ((¬ (a + b)) and ((length x) > 0))

breaking it down

By decomposing a complex problem into smaller parts, it easier to solve the sub-problems and it affords us more flexibility and control. For what it's worth, this technique does not rely on Python's specific OOP mechanisms. These are ordinary, well-defined, pure functions -

def unit(): return ('unit',)
def nullary(op): return ('nullary', op)
def unary(op, a): return ('unary', op, a)
def binary(op, a, b): return ('binary', op, a, b)

Now using a flat case analysis as we've done before, we implement our recursive expression constructor expr -

  1. if the input a is not a list, it is a single value. construct a nullary expression with a
  2. By induction, the input is a list. If the list is empty, construct an empty result, the unit expression.
  3. By induction, the input not an empty list. If it contains exactly one element, construct a nullary expression with the only element, expr(a[0])
  4. By induction, the input contains at least two elements. If the input is exactly two elements, construct a unary expression with expr(a[0]) and expr(a[1])
  5. By induction, the input contains at least three elements. If the input operator is_infix position, convert to prefix position. Construct a binary expression with expr(a[0]) and expr(a[1]) in swapped position, and the recursive result expr(a[2::])
  6. By induction, the input contains at least three elements as is not in infix position. Construct an ordinary (prefix position) binary expression of expr(a[0]) and expr(a[1]) and the recursive result expr(a[2::])
infix_ops = set([ '+', '-', '*', '/', '>', '<', 'and', 'or' ])

def is_infix (a):
  return a[1] in infix_ops

def expr(a):
  if not isinstance(a, list):
    return nullary(a)                                    #1
  elif len(a) == 0:
      return unit()                                      #2
  elif len(a) == 1:
    return nullary(expr(a[0]))                           #3
  elif len(a) == 2:
    return unary(expr(a[0]), expr(a[1]))                 #4
  elif is_infix(a):
    return binary(expr(a[1]), expr(a[0]), expr(a[2::]))  #5
  else:
    return binary(expr(a[0]), expr(a[1]), expr(a[2::]))  #6

Now to see the result -

tree2 = \
  [[ "¬", ["a", "+", "b"]], "and", [["length", "x"], ">", 0]]

print(expr(tree2))
# ('binary', ('nullary', 'and'), ('unary', ('nullary', '¬'), ('binary', ('nullary', '+'), ('nullary', 'a'), ('nullary', ('nullary', 'b')))), ('nullary', ('binary', ('nullary', '>'), ('unary', ('nullary', 'length'), ('nullary', 'x')), ('nullary', ('nullary', 0)))))

This is just one possible representation of our expressions. Because we implemented our expressions using tuple , Python is able to print them out, despite being verbose. By contrast, here's how Python chooses to represent objects -

class foo: pass
f = foo()
print(f)
# <__main__.foo object at 0x7f2ba03bc8e0>

What's important here is that our expression data structure is well-defined and we can easily perform computations on it or represent it other ways -

def expr_to_str(m):
  if not isinstance(m, tuple):
    return str(m)
  elif m[0] == "unit":
    return ""
  elif m[0] == "nullary":
    return expr_to_str(m[1])
  elif m[0] == "unary":
    return f"({expr_to_str(m[1])} {expr_to_str(m[2])})"
  elif m[0] == "binary":
    return f"({expr_to_str(m[1])} {expr_to_str(m[2])} {expr_to_str(m[3])})"
  else:
    raise TypeError("invalid expression type", m[0])

print(expr_to_str(expr(tree2)))
# (and (¬ (+ a b)) (> (length x) 0))

evaluating an expression

So what if we wanted to evaluate one of our expressions?

m = expr([3, "+", 2, "*", 5, "-", 1])

print(expr_to_str(m))
# (+ 3 (* 2 (- 5 1)))

print(eval_expr(m))
# 11

You're just a few steps away from being able to write eval_expr -

def eval_expr(m):
  if not isinstance(m, tuple):
      return m
  elif m[0] == "unit":
    return None
  elif m[0] == "nullary":
    return eval0(m[1])
  elif m[0] == "unary":
    return eval1(m[1], m[2])
  elif m[0] == "binary":
    return eval2(m[1], m[2], m[3])
  else:
    raise TypeError("invalid expression type", m[0])

See, complex problems are easier when breaking them down into small parts. Now we just write eval0 , eval1 , and eval2 -

def eval0(op):
  return eval_expr(op)

def eval1(op, a):
  if op == expr("not"):      # or op == expr("¬") ...
    return not eval_expr(a)
  elif op == expr("neg"):    # or op == expr("~") ...
    return -eval_expr(a)
  # +, ++, --, etc...
  else:
    raise ValueError("invalid op", op)

def eval2(op, a, b):
  if op == expr("+"):
      return eval_expr(a) + eval_expr(b)
  elif op == expr("-"):
    return eval_expr(a) - eval_expr(b)
  elif op == expr("*"):
    return eval_expr(a) * eval_expr(b)
  elif op == expr("/"):
    return eval_expr(a) / eval_expr(b)
  elif op == expr("and"):
    return eval_expr(a) and eval_expr(b)
  # >, <, or, xor, etc...
  else:
    raise ValueError("invalid op", op)

Let's see a mixture of expressions now -

print(eval_expr(expr([True, 'and', ['not', False]])))
# True

print(eval_expr(expr(['neg', [9, '*', 11]])))
# -99

print(eval_expr(expr(['stay', '+', 'inside'])))
# 'stayinside'

You can even define your own functions -

def eval1(op, a):
  # ...
  elif op == expr('scream'):
    return eval_expr(a).upper() # make uppercase!
  else:
    raise ValueError("invalid op", op)

And use them in your expressions -

print(eval_expr(expr(["scream", ["stay", "+", "inside"]])))
# 'STAYINSIDE'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM