简体   繁体   中英

Regex Python exclude some results

There is a test String:

Module([Assign([Name('a', Store())], Num(2)), Assign([Name('b', Store())], Num(3)), Assign([Name('c', Store())], Str('Hello')), Assign([Name('x', Store())], BinOp(Name('a', Load()), Add(), Name('b', Load()))), Assign([Name('x', Store())], Name('a', Load())), Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None)), For(Name('i', Store()), Call(Name('range', Load()), [Num(10)], [], None, None), [Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None))], [])])

I am trying to get all loaded variable names from it. My regexp is

[a-z]+(?=', Load)

Result of it is the following: 正则表达式的结果 As you can see it also finds built-in modules such as print, range. How to exclude them? Values to be excluded are preceded by

Call(Name(' 

I tried

 (?=Call\(Name\(')[a-z]+(?=', Load)

but it did not work out.

My code is:

import re

test = '''Module([Assign([Name('a', Store())], Num(2)), Assign([Name('b', Store())], Num(3)), Assign([Name('c', Store())], Str('Hello')), Assign([Name('x', Store())], BinOp(Name('a', Load()), Add(), Name('b', Load()))), Assign([Name('x', Store())], Name('a', Load())), Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None)), For(Name('i', Store()), Call(Name('range', Load()), [Num(10)], [], None, None), [Expr(Call(Name('print', Load()), [Name('a', Load())], [], None, None))], [])])'''
print(re.findall(r"[a-z]+(?=', Load)", test))
print(re.findall(r"(?=Call\(Name\(')[a-z]+(?=', Load) ", test))

Use a lookbehind and word boundary .

(?<!Call\(Name\(')\b\w+\b(?=', Load)

See demo.

https://regex101.com/r/hdxlQ8/1

A negative lookbehind:

(?<!Call\(Name\()'(\w+)(?=', Load)

or

(?<!Call\()Name\('(\w+)', Load

I have used eval() method for this. I don't recommend this way but you can use this as an alternative.

Here test is the variable which has the long string. And filtered variable has your desired list of values.

all = (re.findall(r"[a-z]+(?=', Load)", test))

filtered = []
for each in all:
    try:
        eval(each)
    except NameError:
        filtered.append(each)
    except:
        pass

print filtered

Output:

['a', 'b', 'a', 'a', 'a']

We try to execute each string using eval() method. If there is no any variable, method or class with that name, the python interpretor will throw NameError Exception suggesting this is not a method or variable and hence we are appending/adding the strings to the filtered list.

PS. Anyother exceptions like TypeError are passed.

This looks like a parse tree. I would not use regex for this for countless reasons, much better explained by others in some pretty famous posts (granted that post uses [x]html but the lesson remains, do not use regular expressions to parse more complex grammars).

My understanding is that ASTs and in this case the actual concrete parse tree use Context Free Grammars and therefore aren't regular and cannot be reliably parsed using regular expressions. Plus, that code is already in a pretty convenient state in terms of walk-ability. If anything recreate the objects and walk the tree while knowing the rule that variables names will be the left side terminal of the Assign statement, with its value on the right. This will certainly take less time and cause less headache than using regex.

Do yourself a favor and do not attempt this with regex unless you are dealing with a small, known variety of these.

For further reading .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM