简体   繁体   中英

Python Regex match or potential match

Question:

How do I use Python's regular expression module ( re ) to determine if a match has been made, or that a potential match could be made?

Details:

I want a regex pattern which searches for a pattern of words in a correct order regardless of what's between them. I want a function which returns Yes if found, Maybe if a match could still be found or No if no match can be found. We are looking for the pattern One|....|Two|....|Three , here are some examples (Note the names, their count, or their order are not important, all I care about is the three words One , Two and Three , and the acceptable words in between are John , Malkovich , Stamos and Travolta ).

Returns YES:

One|John|Malkovich|Two|John|Stamos|Three|John|Travolta

Returns YES:

One|John|Two|John|Three|John

Returns YES:

One|Two|Three

Returns MAYBE:

One|Two

Returns MAYBE:

One

Returns NO:

Three|Two|One

I understand the examples are not airtight, so here is what I have for the regex to get YES:

if re.match('One\|(John\||Malkovich\||Stamos\||Travolta\|)*Two\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
   return 'Yes'

Obviously if the pattern is Three|Two|One the above will fail, and we can return No , but how do I check for the Maybe case? I thought about nesting the parentheses, like so (note, not tested)

if re.match('One\|((John\||Malkovich\||Stamos\||Travolta\|)*Two(\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*)*)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
   return 'Yes'

But I don't think that will do what I want it to do.

More Details:

I am not actually looking for Travoltas and Malkovichs (shocking, I know). I am matching against inotify Patterns such as IN_MOVE , IN_CREATE , IN_OPEN , and I am logging them and getting hundreds of them, then I go in and then look for a particular pattern such as IN_ACCESS ... IN_OPEN .... IN_MODIFY , but in some cases I don't want an IN_DELETE after the IN_OPEN and in others I do. I'm essentially pattern matching to use inotify to detect when text editors gone wild and they try to crush programmers souls by doing a temporary-file-swap-save instead of just modifying the file. I don't want to free up those logs instantly, but I only want to hold on to them for as long as is necessary. Maybe means dont erase the logs. Yes means do something then erase the log and No means don't do anything but still erase the logs. As I will have multiple rules for each program (ie. vim v gedit v emacs ) I wanted to use a regular expression which would be more human readable and easier to write then creating a massive tree, or as user Joel suggested, just going over the words with a loop

I wouldn't use a regex for this. But it's definitely possible:

regex = re.compile(
    r"""^           # Start of string
    (?:             # Match...
     (?:            # one of the following:
      One()         # One (use empty capturing group to indicate match)
     |              # or
      \1Two()       # Two if One has matched previously
     |              # or
      \1\2Three()   # Three if One and Two have matched previously
     |              # or
      John          # any of the other strings
     |              # etc.
      Malkovich
     |
      Stamos
     |
      Travolta
     )              # End of alternation
     \|?            # followed by optional separator
    )*              # any number of repeats
    $               # until the end of the string.""", 
    re.VERBOSE)

Now you can check for YES and MAYBE by checking if you get a match at all:

>>> yes = regex.match("One|John|Malkovich|Two|John|Stamos|Three|John|Travolta")
>>> yes
<_sre.SRE_Match object at 0x0000000001F90620>
>>> maybe = regex.match("One|John|Malkovich|Two|John|Stamos")
>>> maybe
<_sre.SRE_Match object at 0x0000000001F904F0>

And you can differentiate between YES and MAYBE by checking whether all of the groups have participated in the match (ie are not None ):

>>> yes.groups()
('', '', '')
>>> maybe.groups()
('', '', None)

And if the regex doesn't match at all, that's a NO for you:

>>> no = regex.match("Three|Two|One")
>>> no is None
True

Some people, when confronted with a problem, think "I know, I'll use regular expressions." Now they have two problems. - Jamie Zawinski

Perhaps an algorithm like this would be more appropriate. Here is some pseudocode.

matchlist.current = matchlist.first()
for each word in input
    if word = matchlist.current
        matchlist.current = matchlist.next() // assuming next returns null if at end of list
    else if not allowedlist.contains(word)
        return 'No'
if matchlist.current = null // we hit the end of the list
    return 'Yes'
return 'Maybe'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM