简体   繁体   English

Python正则表达式匹配或潜在匹配

[英]Python Regex match or potential match

Question: 题:

How do I use Python's regular expression module ( re ) to determine if a match has been made, or that a potential match could be made? 如何使用Python的正则表达式模块( re )确定是否进行了匹配,或者可以进行潜在的匹配?

Details: 细节:

I want a regex pattern which searches for a pattern of words in a correct order regardless of what's between them. 我想要一个正则表达式模式,以正确的顺序搜索单词模式,而不管它们之间是什么。 I want a function which returns Yes if found, Maybe if a match could still be found or No if no match can be found. 我想它返回一个功能Yes ,如果找到, Maybe如果比赛仍然可以找到或No ,如果没有匹配都可以找到。 We are looking for the pattern One|....|Two|....|Three , here are some examples (Note the names, their count, or their order are not important, all I care about is the three words One , Two and Three , and the acceptable words in between are John , Malkovich , Stamos and Travolta ). 我们正在寻找模式One|....|Two|....|Three ,下面是一些示例(注意名称,数量或顺序并不重要,我只关心三个单词OneTwoThree ,介于两者之间的可接受单词是JohnMalkovichStamosTravolta )。

Returns YES: 返回是:

One|John|Malkovich|Two|John|Stamos|Three|John|Travolta

Returns YES: 返回是:

One|John|Two|John|Three|John

Returns YES: 返回是:

One|Two|Three

Returns MAYBE: 返回可能:

One|Two

Returns MAYBE: 返回可能:

One

Returns NO: 返回NO:

Three|Two|One

I understand the examples are not airtight, so here is what I have for the regex to get YES: 我了解这些示例并不是密不可分的,因此这是我让正则表达式获得肯定的结果:

if re.match('One\|(John\||Malkovich\||Stamos\||Travolta\|)*Two\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
   return 'Yes'

Obviously if the pattern is Three|Two|One the above will fail, and we can return No , but how do I check for the Maybe case? 显然,如果模式为Three|Two|One则上述操作将失败,并且我们可以返回No ,但是如何检查Maybe情况? I thought about nesting the parentheses, like so (note, not tested) 我考虑过像这样嵌套括号(注意,未经测试)

if re.match('One\|((John\||Malkovich\||Stamos\||Travolta\|)*Two(\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*)*)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
   return 'Yes'

But I don't think that will do what I want it to do. 但是我不认为这会做我想做的事。

More Details: 更多细节:

I am not actually looking for Travoltas and Malkovichs (shocking, I know). 我实际上并不是在寻找TravoltasMalkovichs (我知道,这令人震惊)。 I am matching against inotify Patterns such as IN_MOVE , IN_CREATE , IN_OPEN , and I am logging them and getting hundreds of them, then I go in and then look for a particular pattern such as IN_ACCESS ... IN_OPEN .... IN_MODIFY , but in some cases I don't want an IN_DELETE after the IN_OPEN and in others I do. 我正在与inotify模式匹配,例如IN_MOVEIN_CREATEIN_OPEN ,并且正在记录它们并获取数百个它们,然后进入,然后查找特定的模式,例如IN_ACCESS ... IN_OPEN .... IN_MODIFY ,但是在某些情况下,我不希望在IN_DELETE之后再输入IN_OPEN而在其他情况下,我希望这样做。 I'm essentially pattern matching to use inotify to detect when text editors gone wild and they try to crush programmers souls by doing a temporary-file-swap-save instead of just modifying the file. 我本质上是在进行模式匹配,以使用inotify来检测文本编辑器何时变得疯狂 ,他们试图通过进行临时文件交换保存而不是仅仅修改文件来压垮程序员的灵魂。 I don't want to free up those logs instantly, but I only want to hold on to them for as long as is necessary. 我不想立即释放这些日志,但是我只想保留它们必要的时间。 Maybe means dont erase the logs. Maybe意味着不要擦除日志。 Yes means do something then erase the log and No means don't do anything but still erase the logs. Yes意思是先执行某些操作然后清除日志, No意思是不执行任何操作,但仍然清除日志。 As I will have multiple rules for each program (ie. vim v gedit v emacs ) I wanted to use a regular expression which would be more human readable and easier to write then creating a massive tree, or as user Joel suggested, just going over the words with a loop 因为我对每个程序都有多个规则(例如vim v gedit v emacs ),所以我想使用正则表达式,该表达式更易于阅读,更易于编写,然后创建大型树,或者按照用户Joel的建议进行操作带有循环的单词

I wouldn't use a regex for this. 我不会为此使用正则表达式。 But it's definitely possible: 但这绝对是可能的:

regex = re.compile(
    r"""^           # Start of string
    (?:             # Match...
     (?:            # one of the following:
      One()         # One (use empty capturing group to indicate match)
     |              # or
      \1Two()       # Two if One has matched previously
     |              # or
      \1\2Three()   # Three if One and Two have matched previously
     |              # or
      John          # any of the other strings
     |              # etc.
      Malkovich
     |
      Stamos
     |
      Travolta
     )              # End of alternation
     \|?            # followed by optional separator
    )*              # any number of repeats
    $               # until the end of the string.""", 
    re.VERBOSE)

Now you can check for YES and MAYBE by checking if you get a match at all: 现在,您可以通过检查是否完全匹配来检查是和否:

>>> yes = regex.match("One|John|Malkovich|Two|John|Stamos|Three|John|Travolta")
>>> yes
<_sre.SRE_Match object at 0x0000000001F90620>
>>> maybe = regex.match("One|John|Malkovich|Two|John|Stamos")
>>> maybe
<_sre.SRE_Match object at 0x0000000001F904F0>

And you can differentiate between YES and MAYBE by checking whether all of the groups have participated in the match (ie are not None ): 您可以通过检查所有组是否都参加了比赛(即不是None )来区分YES和MAYBE:

>>> yes.groups()
('', '', '')
>>> maybe.groups()
('', '', None)

And if the regex doesn't match at all, that's a NO for you: 如果正则表达式根本不匹配,那么对您来说是不对的:

>>> no = regex.match("Three|Two|One")
>>> no is None
True

Some people, when confronted with a problem, think "I know, I'll use regular expressions." 有些人在遇到问题时会认为“我知道,我会使用正则表达式”。 Now they have two problems. 现在他们有两个问题。 - Jamie Zawinski -杰米·扎温斯基

Perhaps an algorithm like this would be more appropriate. 也许像这样的算法会更合适。 Here is some pseudocode. 这是一些伪代码。

matchlist.current = matchlist.first()
for each word in input
    if word = matchlist.current
        matchlist.current = matchlist.next() // assuming next returns null if at end of list
    else if not allowedlist.contains(word)
        return 'No'
if matchlist.current = null // we hit the end of the list
    return 'Yes'
return 'Maybe'

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM