如何在python中拆分一串数学表达式？

Question

I made a program which convert infix to postfix in python.我制作了一个程序，可以在python中将中缀转换为后缀。 The problem is when I introduce the arguments.问题是当我引入论点时。 If i introduce something like this: (this will be a string)如果我介绍这样的东西：（这将是一个字符串）

( ( 73 + ( ( 34 - 72 ) / ( 33 - 3 ) ) ) + ( 56 + ( 95 - 28 ) ) )

it will split it with .split() and the program will work correctly.它将用 .split() 拆分它，程序将正常运行。 But I want the user to be able to introduce something like this:但我希望用户能够介绍这样的东西：

((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )

As you can see I want that the blank spaces can be trivial but the program continue splitting the string by parentheses, integers (not digits) and operands.如您所见，我希望空格可以是微不足道的，但程序会继续按括号、整数（不是数字）和操作数拆分字符串。

I try to solve it with a for but I don't know how to catch the whole number (73 , 34 ,72) instead one digit by digit (7, 3 , 3 , 4 , 7 , 2)我尝试用for解决它，但我不知道如何捕捉整数 (73 , 34 ,72) 而不是一个数字一个数字 (7, 3 , 3 , 4 , 7 , 2)

To sum up, what I want is split a string like ((81 * 6) /42+ (3-1)) into:综上所述，我想要的是将像((81 * 6) /42+ (3-1))这样的字符串拆分为：

[(, (, 81, *, 6, ), /, 42, +, (, 3, -, 1, ), )]

Answer 1

Tree with `ast`与`ast`树

You could use ast to get a tree of the expression :您可以使用ast来获取表达式树：

import ast

source = '((81 * 6) /42+ (3-1))'
node = ast.parse(source) 

def show_children(node, level=0):
    if isinstance(node, ast.Num):
        print(' ' * level + str(node.n))
    else:
        print(' ' * level + str(node))
    for child in ast.iter_child_nodes(node):
        show_children(child, level+1)

show_children(node)

It outputs :它输出：

<_ast.Module object at 0x7f56abbc5490>
 <_ast.Expr object at 0x7f56abbc5350>
  <_ast.BinOp object at 0x7f56abbc5450>
   <_ast.BinOp object at 0x7f56abbc5390>
    <_ast.BinOp object at 0x7f56abb57cd0>
     81
     <_ast.Mult object at 0x7f56abbd0dd0>
     6
    <_ast.Div object at 0x7f56abbd0e50>
    42
   <_ast.Add object at 0x7f56abbd0cd0>
   <_ast.BinOp object at 0x7f56abb57dd0>
    3
    <_ast.Sub object at 0x7f56abbd0d50>
    1

As @user2357112 wrote in the comments : ast.parse interprets Python syntax, not mathematical expressions.正如@user2357112 在评论中所写： ast.parse解释 Python 语法，而不是数学表达式。 (1+2)(3+4) would be parsed as a function call and list comprehensions would be accepted even though they probably shouldn't be considered a valid mathematical expression. (1+2)(3+4)将被解析为函数调用并且列表推导式将被接受，即使它们可能不应被视为有效的数学表达式。

List with a regex用正则表达式列出

If you want a flat structure, a regex could work :如果你想要一个扁平的结构，正则表达式可以工作：

import re

number_or_symbol = re.compile('(\d+|[^ 0-9])')
print(re.findall(number_or_symbol, source))
# ['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

It looks for either :它寻找：

multiple digits多位数
or any character which isn't a digit or a space或任何不是数字或空格的字符

Once you have a list of elements, you could check if the syntax is correct, for example with a stack to check if parentheses are matching, or if every element is a known one.一旦你有了一个元素列表，你就可以检查语法是否正确，例如使用stack来检查括号是否匹配，或者每个元素是否都是已知的。

Answer 2

You need to implement a very simple tokenizer for your input.您需要为您的输入实现一个非常简单的标记器。 You have the following types of tokens:您有以下类型的令牌：

( (
) )
+ +
- ——
* *
/ /
\\d+ \\d+

You can find them in your input string separated by all sorts of white space.您可以在由各种空格分隔的输入字符串中找到它们。

So a first step is to process the string from start to finish, and extract these tokens, and then do your parsing on the tokens, rather than on the string itself.所以第一步是从头到尾处理字符串，并提取这些标记，然后对标记进行解析，而不是对字符串本身进行解析。

A nifty way to do this is to use the following regular expression: '\\s*([()+*/-]|\\d+)' .一个很好的方法是使用以下正则表达式： '\\s*([()+*/-]|\\d+)' 。 You can then:然后你可以：

import re

the_input='(3+(2*5))'
tokens = []
tokenizer = re.compile(r'\s*([()+*/-]|\d+)')
current_pos = 0
while current_pos < len(the_input):
  match = tokenizer.match(the_input, current_pos)
  if match is None:
     raise Error('Syntax error')
  tokens.append(match.group(1))
  current_pos = match.end()
print(tokens)

This will print ['(', '3', '+', '(', '2', '*', '5', ')', ')']这将打印['(', '3', '+', '(', '2', '*', '5', ')', ')']

You could also use re.findall or re.finditer , but then you'd be skipping non-matches, which are syntax errors in this case.您也可以使用re.findall或re.finditer ，但是您将跳过不匹配项，在这种情况下这是语法错误。

Answer 3

If you don't want to use re module, you can try this:如果你不想使用re模块，你可以试试这个：

s="((81 * 6) /42+ (3-1))"

r=[""]

for i in s.replace(" ",""):
    if i.isdigit() and r[-1].isdigit():
        r[-1]=r[-1]+i
    else:
        r.append(i)
print(r[1:])

Output:输出：

['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

Answer 4

It actual would be pretty trivial to hand-roll a simple expression tokenizer.手动滚动一个简单的表达式标记器实际上是非常简单的。 And I'd think you'd learn more that way as well.而且我认为你也会通过这种方式学到更多。

So for the sake of education and learning, Here is a trivial expression tokenizer implementation which can be extended.因此，为了教育和学习，这里是一个可以扩展的简单表达式标记器实现。 It works based upon the "maximal-much" rule.它基于“最大数量”规则工作。 This means it acts "greedy", trying to consume as many characters as it can to construct each token.这意味着它行为“贪婪”，试图消耗尽可能多的字符来构造每个标记。

Without further ado, here is the tokenizer:事不宜迟，这里是分词器：

class ExpressionTokenizer:
    def __init__(self, expression, operators):
        self.buffer = expression
        self.pos = 0
        self.operators = operators

    def _next_token(self):
        atom = self._get_atom()

        while atom and atom.isspace():
            self._skip_whitespace()
            atom = self._get_atom()

        if atom is None:
            return None
        elif atom.isdigit():
            return self._tokenize_number()
        elif atom in self.operators:
            return self._tokenize_operator()
        else:
            raise SyntaxError()

    def _skip_whitespace(self):
        while self._get_atom():
            if self._get_atom().isspace():
                self.pos += 1
            else:
                break

    def _tokenize_number(self):
        endpos = self.pos + 1
        while self._get_atom(endpos) and self._get_atom(endpos).isdigit():
            endpos += 1
        number = self.buffer[self.pos:endpos]
        self.pos = endpos
        return number

    def _tokenize_operator(self):
        operator = self.buffer[self.pos]
        self.pos += 1
        return operator

    def _get_atom(self, pos=None):
        pos = pos or self.pos
        try:
            return self.buffer[pos]
        except IndexError:
            return None

    def tokenize(self):
        while True:
            token = self._next_token()
            if token is None:
                break
            else:
                yield token

Here is a demo the usage:这是一个演示用法：

tokenizer = ExpressionTokenizer('((81 * 6) /42+ (3-1))', {'+', '-', '*', '/', '(', ')'})
for token in tokenizer.tokenize():
    print(token)

Which produces the output:产生输出：

(
(
81
*
6
)
/
42
+
(
3
-
1
)
)

Answer 5

Quick regex answer: re.findall(r"\\d+|[()+\\-*\\/]", str_in)快速正则表达式答案： re.findall(r"\\d+|[()+\\-*\\/]", str_in)

Demonstration:示范：

>>> import re
>>> str_in = "((81 * 6) /42+ (3-1))"
>>> re.findall(r"\d+|[()+\-*\/]", str_in)
['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', 
')', ')']

For the nested parentheses part, you can use a stack to keep track of the level.对于嵌套括号部分，您可以使用堆栈来跟踪级别。

Answer 6

This does not provide quite the result you want but might be of interest to others who view this question.这并不能提供您想要的结果，但查看此问题的其他人可能会感兴趣。 It makes use of the pyparsing library.它使用pyparsing库。

# Stolen from http://pyparsing.wikispaces.com/file/view/simpleArith.py/30268305/simpleArith.py
# Copyright 2006, by Paul McGuire
# ... and slightly altered

from pyparsing import *

integer = Word(nums).setParseAction(lambda t:int(t[0]))
variable = Word(alphas,exact=1)
operand = integer | variable

expop = Literal('^')
signop = oneOf('+ -')
multop = oneOf('* /')
plusop = oneOf('+ -')
factop = Literal('!')

expr = operatorPrecedence( operand,
    [("!", 1, opAssoc.LEFT),
     ("^", 2, opAssoc.RIGHT),
     (signop, 1, opAssoc.RIGHT),
     (multop, 2, opAssoc.LEFT),
     (plusop, 2, opAssoc.LEFT),]
    )

print (expr.parseString('((81 * 6) /42+ (3-1))'))

Output:输出：

[[[[81, '*', 6], '/', 42], '+', [3, '-', 1]]]

Answer 7

Using grako:使用grako：

start = expr $;
expr = calc | value;
calc = value operator value;
value = integer | "(" @:expr ")" ;
operator = "+" | "-" | "*" | "/";
integer = /\d+/;

grako transpiles to python. grako 转译为 python。

For this example, the return value looks like this:对于此示例，返回值如下所示：

['73', '+', ['34', '-', '72', '/', ['33', '-', '3']], '+', ['56', '+', ['95', '-', '28']]]

Normally you'd use the generated semantics class as a template for further processing.通常，您会使用生成的语义类作为进一步处理的模板。

Answer 8

To provide a more verbose regex approach that you could easily extend:提供更详细的正则表达式方法，您可以轻松扩展：

import re

solution = []
pattern = re.compile('([\d\.]+)')

s = '((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )'

for token in re.split(pattern, s):
    token = token.strip()
    if re.match(pattern, token):
        solution.append(float(token))
        continue
    for character in re.sub(' ', '', token):
        solution.append(character)

Which will give you the result:这会给你的结果：

 solution = ['(', '(', 73, '+', '(', '(', 34, '-', 72, ')', '/', '(', 33, '-', 3, ')', ')', ')', '+', '(', 56, '+', '(', 95, '-', 28, ')', ')', ')']

Answer 9

Similar to @McGrady's answer, you can do this with a basic queue implementation.与@McGrady 的回答类似，您可以使用基本的队列实现来做到这一点。 As a very basic implementation, here's what your Queue class can look like:作为一个非常基本的实现，您的 Queue 类如下所示：

class Queue:

    EMPTY_QUEUE_ERR_MSG = "Cannot do this operation on an empty queue."

    def __init__(self):
        self._items = []

    def __len__(self) -> int:
        return len(self._items)

    def is_empty(self) -> bool:
        return len(self) == 0

    def enqueue(self, item):
        self._items.append(item)

    def dequeue(self):
        try:
            return self._items.pop(0)
        except IndexError:
            raise RuntimeError(Queue.EMPTY_QUEUE_ERR_MSG)

    def peek(self):
        try:
            return self._items[0]
        except IndexError:
            raise RuntimeError(Queue.EMPTY_QUEUE_ERR_MSG)

Using this simple class, you can implement your parse function as:使用这个简单的类，您可以将解析函数实现为：

def tokenize_with_queue(exp: str) -> List:
    queue = Queue()
    cum_digit = ""
    for c in exp.replace(" ", ""):
        if c in ["(", ")", "+", "-", "/", "*"]:
            if cum_digit != "":
                queue.enqueue(cum_digit)
                cum_digit = ""
            queue.enqueue(c)
        elif c.isdigit():
            cum_digit += c
        else:
            raise ValueError
    if cum_digit != "": #one last sweep in case there are any digits waiting
        queue.enqueue(cum_digit)
    return [queue.dequeue() for i in range(len(queue))]

Testing it like below:测试如下：

exp = "((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )"
print(tokenize_with_queue(exp)")

would give you the token list as:会给你令牌列表：

['(', '(', '73', '+', '(', '(', '34', '-', '72', ')', '/', '(', '33', '-', '3', ')', ')', ')', '+', '(', '56', '+', '(', '95', '-', '28', ')', ')', ')']

如何在python中拆分一串数学表达式？

问题描述

9 个解决方案

解决方案1
22 已采纳 2017-04-13 10:31:40

Tree with `ast`与`ast`树

List with a regex用正则表达式列出

解决方案2
12 2017-04-13 10:35:07

解决方案3
5 2017-04-13 11:06:15

解决方案4
5 2017-04-13 16:05:44

解决方案5
2 2017-04-13 10:35:17

解决方案6
2 2017-04-13 17:05:04

解决方案7
2 2017-04-19 15:04:06

解决方案8
1 2017-04-14 04:48:09

解决方案9
0 2020-04-14 01:52:25

如何在python中拆分一串数学表达式？

问题描述

9 个解决方案

解决方案1 22 已采纳 2017-04-13 10:31:40

Tree with ast与ast树

List with a regex用正则表达式列出

解决方案2 12 2017-04-13 10:35:07

解决方案3 5 2017-04-13 11:06:15

解决方案4 5 2017-04-13 16:05:44

解决方案5 2 2017-04-13 10:35:17

解决方案6 2 2017-04-13 17:05:04

解决方案7 2 2017-04-19 15:04:06

解决方案8 1 2017-04-14 04:48:09

解决方案9 0 2020-04-14 01:52:25

解决方案1
22 已采纳 2017-04-13 10:31:40

Tree with `ast`与`ast`树

解决方案2
12 2017-04-13 10:35:07

解决方案3
5 2017-04-13 11:06:15

解决方案4
5 2017-04-13 16:05:44

解决方案5
2 2017-04-13 10:35:17

解决方案6
2 2017-04-13 17:05:04

解决方案7
2 2017-04-19 15:04:06

解决方案8
1 2017-04-14 04:48:09

解决方案9
0 2020-04-14 01:52:25