简体   繁体   English

如何在python中拆分一串数学表达式?

[英]How can I split a string of a mathematical expressions in python?

I made a program which convert infix to postfix in python.我制作了一个程序,可以在python中将中缀转换为后缀。 The problem is when I introduce the arguments.问题是当我引入论点时。 If i introduce something like this: (this will be a string)如果我介绍这样的东西:(这将是一个字符串)

( ( 73 + ( ( 34 - 72 ) / ( 33 - 3 ) ) ) + ( 56 + ( 95 - 28 ) ) )

it will split it with .split() and the program will work correctly.它将用 .split() 拆分它,程序将正常运行。 But I want the user to be able to introduce something like this:但我希望用户能够介绍这样的东西:

((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )

As you can see I want that the blank spaces can be trivial but the program continue splitting the string by parentheses, integers (not digits) and operands.如您所见,我希望空格可以是微不足道的,但程序会继续按括号、整数(不是数字)和操作数拆分字符串。

I try to solve it with a for but I don't know how to catch the whole number (73 , 34 ,72) instead one digit by digit (7, 3 , 3 , 4 , 7 , 2)我尝试用for解决它,但我不知道如何捕捉整数 (73 , 34 ,72) 而不是一个数字一个数字 (7, 3 , 3 , 4 , 7 , 2)

To sum up, what I want is split a string like ((81 * 6) /42+ (3-1)) into:综上所述,我想要的是将像((81 * 6) /42+ (3-1))这样的字符串拆分为:

[(, (, 81, *, 6, ), /, 42, +, (, 3, -, 1, ), )]

Tree with astast

You could use ast to get a tree of the expression :您可以使用ast来获取表达式树:

import ast

source = '((81 * 6) /42+ (3-1))'
node = ast.parse(source) 

def show_children(node, level=0):
    if isinstance(node, ast.Num):
        print(' ' * level + str(node.n))
    else:
        print(' ' * level + str(node))
    for child in ast.iter_child_nodes(node):
        show_children(child, level+1)

show_children(node)

It outputs :它输出:

<_ast.Module object at 0x7f56abbc5490>
 <_ast.Expr object at 0x7f56abbc5350>
  <_ast.BinOp object at 0x7f56abbc5450>
   <_ast.BinOp object at 0x7f56abbc5390>
    <_ast.BinOp object at 0x7f56abb57cd0>
     81
     <_ast.Mult object at 0x7f56abbd0dd0>
     6
    <_ast.Div object at 0x7f56abbd0e50>
    42
   <_ast.Add object at 0x7f56abbd0cd0>
   <_ast.BinOp object at 0x7f56abb57dd0>
    3
    <_ast.Sub object at 0x7f56abbd0d50>
    1

As @user2357112 wrote in the comments : ast.parse interprets Python syntax, not mathematical expressions.正如@user2357112 在评论中所写: ast.parse解释 Python 语法,而不是数学表达式。 (1+2)(3+4) would be parsed as a function call and list comprehensions would be accepted even though they probably shouldn't be considered a valid mathematical expression. (1+2)(3+4)将被解析为函数调用并且列表推导式将被接受,即使它们可能不应被视为有效的数学表达式。

List with a regex用正则表达式列出

If you want a flat structure, a regex could work :如果你想要一个扁平的结构,正则表达式可以工作:

import re

number_or_symbol = re.compile('(\d+|[^ 0-9])')
print(re.findall(number_or_symbol, source))
# ['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

It looks for either :它寻找:

  • multiple digits多位数
  • or any character which isn't a digit or a space或任何不是数字或空格的字符

Once you have a list of elements, you could check if the syntax is correct, for example with a stack to check if parentheses are matching, or if every element is a known one.一旦你有了一个元素列表,你就可以检查语法是否正确,例如使用stack来检查括号是否匹配,或者每个元素是否都是已知的。

You need to implement a very simple tokenizer for your input.您需要为您的输入实现一个非常简单的标记器。 You have the following types of tokens:您有以下类型的令牌:

  • ( (
  • ) )
  • + +
  • - ——
  • * *
  • / /
  • \\d+ \\d+

You can find them in your input string separated by all sorts of white space.您可以在由各种空格分隔的输入字符串中找到它们。

So a first step is to process the string from start to finish, and extract these tokens, and then do your parsing on the tokens, rather than on the string itself.所以第一步是从头到尾处理字符串,并提取这些标记,然后对标记进行解析,而不是对字符串本身进行解析。

A nifty way to do this is to use the following regular expression: '\\s*([()+*/-]|\\d+)' .一个很好的方法是使用以下正则表达式: '\\s*([()+*/-]|\\d+)' You can then:然后你可以:

import re

the_input='(3+(2*5))'
tokens = []
tokenizer = re.compile(r'\s*([()+*/-]|\d+)')
current_pos = 0
while current_pos < len(the_input):
  match = tokenizer.match(the_input, current_pos)
  if match is None:
     raise Error('Syntax error')
  tokens.append(match.group(1))
  current_pos = match.end()
print(tokens)

This will print ['(', '3', '+', '(', '2', '*', '5', ')', ')']这将打印['(', '3', '+', '(', '2', '*', '5', ')', ')']

You could also use re.findall or re.finditer , but then you'd be skipping non-matches, which are syntax errors in this case.您也可以使用re.findallre.finditer ,但是您将跳过不匹配项,在这种情况下这是语法错误。

If you don't want to use re module, you can try this:如果你不想使用re模块,你可以试试这个:

s="((81 * 6) /42+ (3-1))"

r=[""]

for i in s.replace(" ",""):
    if i.isdigit() and r[-1].isdigit():
        r[-1]=r[-1]+i
    else:
        r.append(i)
print(r[1:])

Output:输出:

['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', ')', ')']

It actual would be pretty trivial to hand-roll a simple expression tokenizer.手动滚动一个简单的表达式标记器实际上是非常简单的。 And I'd think you'd learn more that way as well.而且我认为你也会通过这种方式学到更多。

So for the sake of education and learning, Here is a trivial expression tokenizer implementation which can be extended.因此,为了教育和学习,这里是一个可以扩展的简单表达式标记器实现。 It works based upon the "maximal-much" rule.它基于“最大数量”规则工作。 This means it acts "greedy", trying to consume as many characters as it can to construct each token.这意味着它行为“贪婪”,试图消耗尽可能多的字符来构造每个标记。

Without further ado, here is the tokenizer:事不宜迟,这里是分词器:

class ExpressionTokenizer:
    def __init__(self, expression, operators):
        self.buffer = expression
        self.pos = 0
        self.operators = operators

    def _next_token(self):
        atom = self._get_atom()

        while atom and atom.isspace():
            self._skip_whitespace()
            atom = self._get_atom()

        if atom is None:
            return None
        elif atom.isdigit():
            return self._tokenize_number()
        elif atom in self.operators:
            return self._tokenize_operator()
        else:
            raise SyntaxError()

    def _skip_whitespace(self):
        while self._get_atom():
            if self._get_atom().isspace():
                self.pos += 1
            else:
                break

    def _tokenize_number(self):
        endpos = self.pos + 1
        while self._get_atom(endpos) and self._get_atom(endpos).isdigit():
            endpos += 1
        number = self.buffer[self.pos:endpos]
        self.pos = endpos
        return number

    def _tokenize_operator(self):
        operator = self.buffer[self.pos]
        self.pos += 1
        return operator

    def _get_atom(self, pos=None):
        pos = pos or self.pos
        try:
            return self.buffer[pos]
        except IndexError:
            return None

    def tokenize(self):
        while True:
            token = self._next_token()
            if token is None:
                break
            else:
                yield token

Here is a demo the usage:这是一个演示用法:

tokenizer = ExpressionTokenizer('((81 * 6) /42+ (3-1))', {'+', '-', '*', '/', '(', ')'})
for token in tokenizer.tokenize():
    print(token)

Which produces the output:产生输出:

(
(
81
*
6
)
/
42
+
(
3
-
1
)
)

Quick regex answer: re.findall(r"\\d+|[()+\\-*\\/]", str_in)快速正则表达式答案: re.findall(r"\\d+|[()+\\-*\\/]", str_in)

Demonstration:示范:

>>> import re
>>> str_in = "((81 * 6) /42+ (3-1))"
>>> re.findall(r"\d+|[()+\-*\/]", str_in)
['(', '(', '81', '*', '6', ')', '/', '42', '+', '(', '3', '-', '1', 
')', ')']

For the nested parentheses part, you can use a stack to keep track of the level.对于嵌套括号部分,您可以使用堆栈来跟踪级别。

This does not provide quite the result you want but might be of interest to others who view this question.这并不能提供您想要的结果,但查看此问题的其他人可能会感兴趣。 It makes use of the pyparsing library.它使用pyparsing库。

# Stolen from http://pyparsing.wikispaces.com/file/view/simpleArith.py/30268305/simpleArith.py
# Copyright 2006, by Paul McGuire
# ... and slightly altered

from pyparsing import *

integer = Word(nums).setParseAction(lambda t:int(t[0]))
variable = Word(alphas,exact=1)
operand = integer | variable

expop = Literal('^')
signop = oneOf('+ -')
multop = oneOf('* /')
plusop = oneOf('+ -')
factop = Literal('!')

expr = operatorPrecedence( operand,
    [("!", 1, opAssoc.LEFT),
     ("^", 2, opAssoc.RIGHT),
     (signop, 1, opAssoc.RIGHT),
     (multop, 2, opAssoc.LEFT),
     (plusop, 2, opAssoc.LEFT),]
    )

print (expr.parseString('((81 * 6) /42+ (3-1))'))

Output:输出:

[[[[81, '*', 6], '/', 42], '+', [3, '-', 1]]]

Using grako:使用grako:

start = expr $;
expr = calc | value;
calc = value operator value;
value = integer | "(" @:expr ")" ;
operator = "+" | "-" | "*" | "/";
integer = /\d+/;

grako transpiles to python. grako 转译为 python。

For this example, the return value looks like this:对于此示例,返回值如下所示:

['73', '+', ['34', '-', '72', '/', ['33', '-', '3']], '+', ['56', '+', ['95', '-', '28']]]

Normally you'd use the generated semantics class as a template for further processing.通常,您会使用生成的语义类作为进一步处理的模板。

To provide a more verbose regex approach that you could easily extend:提供更详细的正则表达式方法,您可以轻松扩展:

import re

solution = []
pattern = re.compile('([\d\.]+)')

s = '((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )'

for token in re.split(pattern, s):
    token = token.strip()
    if re.match(pattern, token):
        solution.append(float(token))
        continue
    for character in re.sub(' ', '', token):
        solution.append(character)

Which will give you the result:这会给你的结果:

 solution = ['(', '(', 73, '+', '(', '(', 34, '-', 72, ')', '/', '(', 33, '-', 3, ')', ')', ')', '+', '(', 56, '+', '(', 95, '-', 28, ')', ')', ')']

Similar to @McGrady's answer, you can do this with a basic queue implementation.与@McGrady 的回答类似,您可以使用基本的队列实现来做到这一点。 As a very basic implementation, here's what your Queue class can look like:作为一个非常基本的实现,您的 Queue 类如下所示:

class Queue:

    EMPTY_QUEUE_ERR_MSG = "Cannot do this operation on an empty queue."

    def __init__(self):
        self._items = []

    def __len__(self) -> int:
        return len(self._items)

    def is_empty(self) -> bool:
        return len(self) == 0

    def enqueue(self, item):
        self._items.append(item)

    def dequeue(self):
        try:
            return self._items.pop(0)
        except IndexError:
            raise RuntimeError(Queue.EMPTY_QUEUE_ERR_MSG)

    def peek(self):
        try:
            return self._items[0]
        except IndexError:
            raise RuntimeError(Queue.EMPTY_QUEUE_ERR_MSG)

Using this simple class, you can implement your parse function as:使用这个简单的类,您可以将解析函数实现为:

def tokenize_with_queue(exp: str) -> List:
    queue = Queue()
    cum_digit = ""
    for c in exp.replace(" ", ""):
        if c in ["(", ")", "+", "-", "/", "*"]:
            if cum_digit != "":
                queue.enqueue(cum_digit)
                cum_digit = ""
            queue.enqueue(c)
        elif c.isdigit():
            cum_digit += c
        else:
            raise ValueError
    if cum_digit != "": #one last sweep in case there are any digits waiting
        queue.enqueue(cum_digit)
    return [queue.dequeue() for i in range(len(queue))]

Testing it like below:测试如下:

exp = "((73 + ( (34- 72 ) / ( 33 -3) )) + (56 +(95 - 28) ) )"
print(tokenize_with_queue(exp)")

would give you the token list as:会给你令牌列表:

['(', '(', '73', '+', '(', '(', '34', '-', '72', ')', '/', '(', '33', '-', '3', ')', ')', ')', '+', '(', '56', '+', '(', '95', '-', '28', ')', ')', ')']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM