[英]Parsing logical expressions
I have a task where I have to filter a Pandas DataFrame based on user specified logical expression.我有一个任务,我必须根据用户指定的逻辑表达式过滤 Pandas DataFrame。 Now, I've seen a module called PyParser or LARK which I would like to use but I cannot seem to figure out how to set them up.现在,我看到了一个我想使用的名为 PyParser 或 LARK 的模块,但我似乎不知道如何设置它们。
I have several operators like CONTAINS
, EQUAL
, FUZZY_MATCH
etc. Also, I'd like to combine some expressions into more complex ones.我有几个运算符,如CONTAINS
、 EQUAL
、 FUZZY_MATCH
等。另外,我想将一些表达式组合成更复杂的表达式。
Example expression:示例表达式:
ColumnA CONTAINS [1, 2, 3] AND (ColumnB FUZZY_MATCH 'bla' OR ColumnC EQUAL 45)
As a result, I'd like to have some structured Dict or List with levels of operations in order of how to execute them.因此,我希望有一些结构化的 Dict 或 List 具有操作级别的操作顺序,以便执行它们。 So, the desired result for this example expression would be something like:因此,此示例表达式的预期结果将类似于:
[['ColumnA', 'CONTAINS', '[1, 2, 3]'], 'AND', [['ColumnB', 'FUZZY_MATCH', 'bla'], OR, ['ColumnC', 'EQUAL', '45']]]
or in form of dict:或以字典的形式:
{
'EXPR1': {
'col': 'ColumnA',
'oper': 'CONTAINS',
'value': '[1, 2, 3]']
},
'OPERATOR': 'AND',
'EXPR2': {
'EXPR21': {
'col': 'ColumnB',
'oper': 'FUZZY_MATCH',
'value': 'bla'
},
'OPERATOR': OR,
'EXPR22': {
'col': 'ColumnC',
'oper': 'EQUAL',
'value': '45'
}
}
}
Or something like that.或类似的东西。 If you have some better way of structuring the result, I'm open for suggestions.如果您有更好的方法来构建结果,我愿意接受建议。 I'm pretty new to this so I'm fairly certain this can be improved.我对此很陌生,所以我相当肯定这可以改进。
Interesting problem:)有趣的问题:)
Seems like a relatively straightforward application of the shunting yard algorithm.似乎是调车场算法的一个相对简单的应用。
I had written code to parse expressions like "((20 - 10 ) * (30 - 20) / 10 + 10 ) * 2"
over here .我在这里编写了代码来解析像"((20 - 10 ) * (30 - 20) / 10 + 10 ) * 2"
这样的表达式。
import re
def tokenize(str):
return re.findall("[+/*()-]|\d+", expression)
def is_number(str):
try:
int(str)
return True
except ValueError:
return False
def peek(stack):
return stack[-1] if stack else None
def apply_operator(operators, values):
operator = operators.pop()
right = values.pop()
left = values.pop()
values.append(eval("{0}{1}{2}".format(left, operator, right)))
def greater_precedence(op1, op2):
precedences = {"+": 0, "-": 0, "*": 1, "/": 1}
return precedences[op1] > precedences[op2]
def evaluate(expression):
tokens = tokenize(expression)
values = []
operators = []
for token in tokens:
if is_number(token):
values.append(int(token))
elif token == "(":
operators.append(token)
elif token == ")":
top = peek(operators)
while top is not None and top != "(":
apply_operator(operators, values)
top = peek(operators)
operators.pop() # Discard the '('
else:
# Operator
top = peek(operators)
while top is not None and top != "(" and greater_precedence(top, token):
apply_operator(operators, values)
top = peek(operators)
operators.append(token)
while peek(operators) is not None:
apply_operator(operators, values)
return values[0]
def main():
expression = "((20 - 10 ) * (30 - 20) / 10 + 10 ) * 2"
print(evaluate(expression))
if __name__ == "__main__":
main()
I reckon we can modify the code slightly to make it work for your case:我认为我们可以稍微修改代码以使其适用于您的情况:
tokenize()
.我们需要修改在tokenize()
中对输入字符串进行标记的方式。ColumnA CONTAINS [1, 2, 3] AND (ColumnB FUZZY_MATCH 'bla' OR ColumnC EQUAL 45)
, we want a list of tokens:基本上,给定字符串ColumnA CONTAINS [1, 2, 3] AND (ColumnB FUZZY_MATCH 'bla' OR ColumnC EQUAL 45)
,我们需要一个标记列表:['ColumnA', 'CONTAINS', '[1, 2, 3]', 'AND', '(', 'ColumnB', 'FUZZY_MATCH', "'bla'", 'OR', 'ColumnC', 'EQUAL', '45', ')']
. ['ColumnA', 'CONTAINS', '[1, 2, 3]', 'AND', '(', 'ColumnB', 'FUZZY_MATCH', "'bla'", 'OR', 'ColumnC', 'EQUAL', '45', ')']
。is_number()
function to rather detect things like ColumnA
, [1, 2, 3]
etc.修改is_number()
function 以检测ColumnA
、 [1, 2, 3]
等内容。CONTAINS
/ FUZZY_MATCH
/ EQUAL
, operators AND
/ OR
and parantheses (
/ )
.基本上,除了谓词CONTAINS
/ FUZZY_MATCH
/ EQUAL
、运算符AND
/ OR
和括号(
/ )
之外的所有内容。greater_precedence(op1, op2)
to return true in case op1
is among ['CONTAINS', 'EQUAL', ..]
and op2
is ['AND', 'OR']
.如果op1
在['CONTAINS', 'EQUAL', ..]
之间并且op2
是['AND', 'OR']
则修改greater_precedence(op1, op2)
以返回 true 。contains
and equals
to be always evaluated before AND
/ OR
.这是因为我们希望始终在AND
/ OR
之前评估contains
和equals
。apply_operator(operators, values)
to implement logic of how to evaluate the boolean expression ColumnA CONTAINS [1, 2, 3]
or the expression true AND false
.修改apply_operator(operators, values)
以实现如何评估 boolean 表达式ColumnA CONTAINS [1, 2, 3]
或表达式true AND false
的逻辑。CONTAINS
/ FUZZY_MATCH
/ EQUAL
/ AND
/ OR
etc all are operators here.请记住,这里的CONTAINS
/ FUZZY_MATCH
/ EQUAL
/ AND
/ OR
等都是运算符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.