How do I use Python's regular expression module ( re
) to determine if a match has been made, or that a potential match could be made?
I want a regex pattern which searches for a pattern of words in a correct order regardless of what's between them. I want a function which returns Yes
if found, Maybe
if a match could still be found or No
if no match can be found. We are looking for the pattern One|....|Two|....|Three
, here are some examples (Note the names, their count, or their order are not important, all I care about is the three words One
, Two
and Three
, and the acceptable words in between are John
, Malkovich
, Stamos
and Travolta
).
Returns YES:
One|John|Malkovich|Two|John|Stamos|Three|John|Travolta
Returns YES:
One|John|Two|John|Three|John
Returns YES:
One|Two|Three
Returns MAYBE:
One|Two
Returns MAYBE:
One
Returns NO:
Three|Two|One
I understand the examples are not airtight, so here is what I have for the regex to get YES:
if re.match('One\|(John\||Malkovich\||Stamos\||Travolta\|)*Two\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
return 'Yes'
Obviously if the pattern is Three|Two|One
the above will fail, and we can return No
, but how do I check for the Maybe
case? I thought about nesting the parentheses, like so (note, not tested)
if re.match('One\|((John\||Malkovich\||Stamos\||Travolta\|)*Two(\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*)*)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
return 'Yes'
But I don't think that will do what I want it to do.
I am not actually looking for Travoltas
and Malkovichs
(shocking, I know). I am matching against inotify Patterns such as IN_MOVE
, IN_CREATE
, IN_OPEN
, and I am logging them and getting hundreds of them, then I go in and then look for a particular pattern such as IN_ACCESS
... IN_OPEN
.... IN_MODIFY
, but in some cases I don't want an IN_DELETE
after the IN_OPEN
and in others I do. I'm essentially pattern matching to use inotify to detect when text editors gone wild and they try to crush programmers souls by doing a temporary-file-swap-save instead of just modifying the file. I don't want to free up those logs instantly, but I only want to hold on to them for as long as is necessary. Maybe
means dont erase the logs. Yes
means do something then erase the log and No
means don't do anything but still erase the logs. As I will have multiple rules for each program (ie. vim
v gedit
v emacs
) I wanted to use a regular expression which would be more human readable and easier to write then creating a massive tree, or as user Joel suggested, just going over the words with a loop
I wouldn't use a regex for this. But it's definitely possible:
regex = re.compile(
r"""^ # Start of string
(?: # Match...
(?: # one of the following:
One() # One (use empty capturing group to indicate match)
| # or
\1Two() # Two if One has matched previously
| # or
\1\2Three() # Three if One and Two have matched previously
| # or
John # any of the other strings
| # etc.
Malkovich
|
Stamos
|
Travolta
) # End of alternation
\|? # followed by optional separator
)* # any number of repeats
$ # until the end of the string.""",
re.VERBOSE)
Now you can check for YES and MAYBE by checking if you get a match at all:
>>> yes = regex.match("One|John|Malkovich|Two|John|Stamos|Three|John|Travolta")
>>> yes
<_sre.SRE_Match object at 0x0000000001F90620>
>>> maybe = regex.match("One|John|Malkovich|Two|John|Stamos")
>>> maybe
<_sre.SRE_Match object at 0x0000000001F904F0>
And you can differentiate between YES and MAYBE by checking whether all of the groups have participated in the match (ie are not None
):
>>> yes.groups()
('', '', '')
>>> maybe.groups()
('', '', None)
And if the regex doesn't match at all, that's a NO for you:
>>> no = regex.match("Three|Two|One")
>>> no is None
True
Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski
Perhaps an algorithm like this would be more appropriate. Here is some pseudocode.
matchlist.current = matchlist.first()
for each word in input
if word = matchlist.current
matchlist.current = matchlist.next() // assuming next returns null if at end of list
else if not allowedlist.contains(word)
return 'No'
if matchlist.current = null // we hit the end of the list
return 'Yes'
return 'Maybe'
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.