简体   繁体   English

正则表达式Python排除了一些结果

[英]Regex Python exclude some results

There is a test String: 有一个测试字符串:

Module([Assign([Name('a', Store())], Num(2)), Assign([Name('b', Store())], Num(3)), Assign([Name('c', Store())], Str('Hello')), Assign([Name('x', Store())], BinOp(Name('a', Load()), Add(), Name('b', Load()))), Assign([Name('x', Store())], Name('a', Load())), Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None)), For(Name('i', Store()), Call(Name('range', Load()), [Num(10)], [], None, None), [Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None))], [])]) 模块([Assign([Name('a',Store())],Num(2)),Assign([Name('b',Store())],Num(3)),Assign([Name( 'c',Store())],Str('Hello')),Assign([Name('x',Store())],BinOp(Name('a',Load()),Add(), Name('b',Load()))),Assign([Name('x',Store())],Name('a',Load())),Expr(Call(Name('print', Load()),[Name('a',Load())],[],无,无)),For(Name('i',Store()),Call(Name('range'),Load( )),[Num(10)],[],无,无),[Expr(Call(Name('print',Load()),[Name('a',Load())],[],无,无))],[])])

I am trying to get all loaded variable names from it. 我正在尝试从中获取所有已加载的变量名称。 My regexp is 我的正则表达式是

[a-z]+(?=', Load)

Result of it is the following: 结果如下: 正则表达式的结果 As you can see it also finds built-in modules such as print, range. 如您所见,它还可以找到内置模块,例如打印,范围。 How to exclude them? 如何排除它们? Values to be excluded are preceded by 要排除的值前面有

Call(Name(' 

I tried 我试过了

 (?=Call\(Name\(')[a-z]+(?=', Load)

but it did not work out. 但没有成功。

My code is: 我的代码是:

import re

test = '''Module([Assign([Name('a', Store())], Num(2)), Assign([Name('b', Store())], Num(3)), Assign([Name('c', Store())], Str('Hello')), Assign([Name('x', Store())], BinOp(Name('a', Load()), Add(), Name('b', Load()))), Assign([Name('x', Store())], Name('a', Load())), Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None)), For(Name('i', Store()), Call(Name('range', Load()), [Num(10)], [], None, None), [Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None))], [])])'''
print(re.findall(r"[a-z]+(?=', Load)", test))
print(re.findall(r"(?=Call\(Name\(')[a-z]+(?=', Load) ", test))

Use a lookbehind and word boundary . 使用后lookbehindword boundary

(?<!Call\(Name\(')\b\w+\b(?=', Load)

See demo. 参见演示。

https://regex101.com/r/hdxlQ8/1 https://regex101.com/r/hdxlQ8/1

A negative lookbehind: 负面印象:

(?<!Call\(Name\()'(\w+)(?=', Load)

or 要么

(?<!Call\()Name\('(\w+)', Load

I have used eval() method for this. 我已经为此使用eval()方法。 I don't recommend this way but you can use this as an alternative. 我不推荐这种方式,但是您可以使用它作为替代。

Here test is the variable which has the long string. 这里的test是具有长字符串的变量。 And filtered variable has your desired list of values. filtered变量具有所需的值列表。

all = (re.findall(r"[a-z]+(?=', Load)", test))

filtered = []
for each in all:
    try:
        eval(each)
    except NameError:
        filtered.append(each)
    except:
        pass

print filtered

Output: 输出:

['a', 'b', 'a', 'a', 'a']

We try to execute each string using eval() method. 我们尝试使用eval()方法执行每个字符串。 If there is no any variable, method or class with that name, the python interpretor will throw NameError Exception suggesting this is not a method or variable and hence we are appending/adding the strings to the filtered list. 如果没有任何具有该名称的变量,方法或类,则python解释器将抛出NameError Exception,表明这不是方法或变量,因此我们将字符串追加/添加到过滤列表中。

PS. PS。 Anyother exceptions like TypeError are passed. 任何其他异常(如TypeError被传递。

This looks like a parse tree. 这看起来像一棵解析树。 I would not use regex for this for countless reasons, much better explained by others in some pretty famous posts (granted that post uses [x]html but the lesson remains, do not use regular expressions to parse more complex grammars). 出于多种原因,我不会使用正则表达式,在一些非常著名的文章中 ,其他人对此进行了更好的解释(当然,该文章使用[x] html但该课程仍然存在,不要使用正则表达式来解析更复杂的语法)。

My understanding is that ASTs and in this case the actual concrete parse tree use Context Free Grammars and therefore aren't regular and cannot be reliably parsed using regular expressions. 我的理解是AST,在这种情况下,实际的具体解析树使用上下文无关文法,因此不是正规的,因此无法使用正则表达式可靠地进行解析。 Plus, that code is already in a pretty convenient state in terms of walk-ability. 另外,就可行走性而言,该代码已经处于非常方便的状态。 If anything recreate the objects and walk the tree while knowing the rule that variables names will be the left side terminal of the Assign statement, with its value on the right. 如果知道什么规则,则重新创建对象并走树,同时知道变量名称将是Assign语句的左侧终端的规则,其值在右侧。 This will certainly take less time and cause less headache than using regex. 与使用正则表达式相比,这无疑将花费更少的时间并减少头痛。

Do yourself a favor and do not attempt this with regex unless you are dealing with a small, known variety of these. 帮自己一个忙,不要尝试使用正则表达式,除非您要处理的是这些已知的很小的品种。

For further reading . 进一步阅读

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM