简体   繁体   中英

Regular expression in python - help needed

Like many other people posting questions here, I recently started programming in Python. I'm faced with a problem trying to define the regular expression to extract a variable name (I have a list of variable names saved in the list) from a string. I am parsing part of the code which I take line by line from a file. I make a list of variables,:

>>> variable_list = ['var1', 'var2', 'var4_more', 'var3', 'var1_more']

What I want to do is to define re.compile with something that won't say that it found two var1 ; I want to make an exact match. According to the example above, var should match nothing, var1 should match only the first element of the list.

I presume that the answer may be combining regex with negation of other regex, but I am not sure how to solve this problem.

OK, I have noticed that I missed one important thing. Variable list is gathered from a string, so it's possible to have a space before the var name, or sign after. More accurate variable_list would be something like

>>> variable_list = [' var1;', 'var1 ;', 'var1)', 'var1_more']

In this case it should recognize first 3, but not the last one as a var1.

It sounds like you just need to anchor your regex with ^ and $ , unless I'm not understanding you properly:

>>> mylist = ['var1', 'var2', 'var3_something', 'var1_text', 'var1var1']
>>> import re
>>> r = re.compile(r'^var1$')
>>> matches = [item for item in mylist if r.match(item)]
>>> print matches
['var1']

So ^var1$ will match exactly var1 , but not var1_text or var1var1 . Is that what you're after?


I suppose one way to handle your edit would be with ^\\W*var1\\W*$ (where var1 is the variable name you want). The \\W shorthand character class matches anything that is not in the \\w class, and \\w in Python is basically alphanumeric characters plus the underscore. The * means that this may be matched zero or more times. This results in:

variable_list = [' var1;', 'var1 ;', 'var1)', 'var1_more']
>>> r = re.compile(r'^\W*var1\W*$')
>>> matches = [item for item in variable_list if r.match(item)]
>>> print matches
[' var1;', 'var1 ;', 'var1)']

If you want the name of the variable without the extraneous stuff then you can capture it and extract the first capture group. Something like this, maybe (probably a bit inefficient since the regex runs twice on matched items):

>>> r = re.compile(r'^\W*(var1)\W*$')
>>> matches = [r.match(item).group(1) for item in variable_list if r.match(item)]
>>> print matches
['var1', 'var1', 'var1']

If you are trying to learn about regular expressions, then maybe this is a useful puzzle, but if you want to see whether a certain word is in a list of words why not this:

>>> 'var1' in mylist
True
>>> 'var1 ' in mylist
False

Not to expand too much more on the regex match, but you might consider using the 'filter()' builtin:

filter(function, iterable) 

So, using one of the regex's suggested by @eldarerathis:

>>> mylist = ['var1', 'var2', 'var3_something', 'var1_text', 'var1var1']
>>> import re
>>> r = re.compile(r'^var1$')

>>> matches = filter(r.match, mylist)
['var1']

Or using your own match function:

>>> def matcher(value):
>>>     ... match statement ...

>>> filter(matcher, mylist)
['var1']

Or negate the regex earlier with a lambda:

>>> filter(lambda x: not r.match(x), mylist)
['var2', 'var3_something', 'var1_text', 'var1var1']

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM