[英]Python Regex match or potential match
How do I use Python's regular expression module ( re
) to determine if a match has been made, or that a potential match could be made? 如何使用Python的正则表达式模块( re
)确定是否进行了匹配,或者可以进行潜在的匹配?
I want a regex pattern which searches for a pattern of words in a correct order regardless of what's between them. 我想要一个正则表达式模式,以正确的顺序搜索单词模式,而不管它们之间是什么。 I want a function which returns Yes
if found, Maybe
if a match could still be found or No
if no match can be found. 我想它返回一个功能Yes
,如果找到, Maybe
如果比赛仍然可以找到或No
,如果没有匹配都可以找到。 We are looking for the pattern One|....|Two|....|Three
, here are some examples (Note the names, their count, or their order are not important, all I care about is the three words One
, Two
and Three
, and the acceptable words in between are John
, Malkovich
, Stamos
and Travolta
). 我们正在寻找模式One|....|Two|....|Three
,下面是一些示例(注意名称,数量或顺序并不重要,我只关心三个单词One
, Two
和Three
,介于两者之间的可接受单词是John
, Malkovich
, Stamos
和Travolta
)。
Returns YES: 返回是:
One|John|Malkovich|Two|John|Stamos|Three|John|Travolta
Returns YES: 返回是:
One|John|Two|John|Three|John
Returns YES: 返回是:
One|Two|Three
Returns MAYBE: 返回可能:
One|Two
Returns MAYBE: 返回可能:
One
Returns NO: 返回NO:
Three|Two|One
I understand the examples are not airtight, so here is what I have for the regex to get YES: 我了解这些示例并不是密不可分的,因此这是我让正则表达式获得肯定的结果:
if re.match('One\|(John\||Malkovich\||Stamos\||Travolta\|)*Two\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
return 'Yes'
Obviously if the pattern is Three|Two|One
the above will fail, and we can return No
, but how do I check for the Maybe
case? 显然,如果模式为Three|Two|One
则上述操作将失败,并且我们可以返回No
,但是如何检查Maybe
情况? I thought about nesting the parentheses, like so (note, not tested) 我考虑过像这样嵌套括号(注意,未经测试)
if re.match('One\|((John\||Malkovich\||Stamos\||Travolta\|)*Two(\|(John\||Malkovich\||Stamos\||Travolta\|)*Three\|(John\||Malkovich\||Stamos\||Travolta\|)*)*)*', 'One|John|Malkovich|Two|John|Stamos|Three|John|Travolta') != None
return 'Yes'
But I don't think that will do what I want it to do. 但是我不认为这会做我想做的事。
I am not actually looking for Travoltas
and Malkovichs
(shocking, I know). 我实际上并不是在寻找Travoltas
和Malkovichs
(我知道,这令人震惊)。 I am matching against inotify Patterns such as IN_MOVE
, IN_CREATE
, IN_OPEN
, and I am logging them and getting hundreds of them, then I go in and then look for a particular pattern such as IN_ACCESS
... IN_OPEN
.... IN_MODIFY
, but in some cases I don't want an IN_DELETE
after the IN_OPEN
and in others I do. 我正在与inotify模式匹配,例如IN_MOVE
, IN_CREATE
, IN_OPEN
,并且正在记录它们并获取数百个它们,然后进入,然后查找特定的模式,例如IN_ACCESS
... IN_OPEN
.... IN_MODIFY
,但是在某些情况下,我不希望在IN_DELETE
之后再输入IN_OPEN
而在其他情况下,我希望这样做。 I'm essentially pattern matching to use inotify to detect when text editors gone wild and they try to crush programmers souls by doing a temporary-file-swap-save instead of just modifying the file. 我本质上是在进行模式匹配,以使用inotify来检测文本编辑器何时变得疯狂 ,他们试图通过进行临时文件交换保存而不是仅仅修改文件来压垮程序员的灵魂。 I don't want to free up those logs instantly, but I only want to hold on to them for as long as is necessary. 我不想立即释放这些日志,但是我只想保留它们必要的时间。 Maybe
means dont erase the logs. Maybe
意味着不要擦除日志。 Yes
means do something then erase the log and No
means don't do anything but still erase the logs. Yes
意思是先执行某些操作然后清除日志, No
意思是不执行任何操作,但仍然清除日志。 As I will have multiple rules for each program (ie. vim
v gedit
v emacs
) I wanted to use a regular expression which would be more human readable and easier to write then creating a massive tree, or as user Joel suggested, just going over the words with a loop 因为我对每个程序都有多个规则(例如vim
v gedit
v emacs
),所以我想使用正则表达式,该表达式更易于阅读,更易于编写,然后创建大型树,或者按照用户Joel的建议进行操作带有循环的单词
I wouldn't use a regex for this. 我不会为此使用正则表达式。 But it's definitely possible: 但这绝对是可能的:
regex = re.compile(
r"""^ # Start of string
(?: # Match...
(?: # one of the following:
One() # One (use empty capturing group to indicate match)
| # or
\1Two() # Two if One has matched previously
| # or
\1\2Three() # Three if One and Two have matched previously
| # or
John # any of the other strings
| # etc.
Malkovich
|
Stamos
|
Travolta
) # End of alternation
\|? # followed by optional separator
)* # any number of repeats
$ # until the end of the string.""",
re.VERBOSE)
Now you can check for YES and MAYBE by checking if you get a match at all: 现在,您可以通过检查是否完全匹配来检查是和否:
>>> yes = regex.match("One|John|Malkovich|Two|John|Stamos|Three|John|Travolta")
>>> yes
<_sre.SRE_Match object at 0x0000000001F90620>
>>> maybe = regex.match("One|John|Malkovich|Two|John|Stamos")
>>> maybe
<_sre.SRE_Match object at 0x0000000001F904F0>
And you can differentiate between YES and MAYBE by checking whether all of the groups have participated in the match (ie are not None
): 您可以通过检查所有组是否都参加了比赛(即不是None
)来区分YES和MAYBE:
>>> yes.groups()
('', '', '')
>>> maybe.groups()
('', '', None)
And if the regex doesn't match at all, that's a NO for you: 如果正则表达式根本不匹配,那么对您来说是不对的:
>>> no = regex.match("Three|Two|One")
>>> no is None
True
Some people, when confronted with a problem, think "I know, I'll use regular expressions." 有些人在遇到问题时会认为“我知道,我会使用正则表达式”。 Now they have two problems. 现在他们有两个问题。 - Jamie Zawinski -杰米·扎温斯基
Perhaps an algorithm like this would be more appropriate. 也许像这样的算法会更合适。 Here is some pseudocode. 这是一些伪代码。
matchlist.current = matchlist.first()
for each word in input
if word = matchlist.current
matchlist.current = matchlist.next() // assuming next returns null if at end of list
else if not allowedlist.contains(word)
return 'No'
if matchlist.current = null // we hit the end of the list
return 'Yes'
return 'Maybe'
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.