简体   繁体   English

如何在Python中搜索正则表达式匹配?

[英]How do I search through regex matches in Python?

I need to try a string against multiple (exclusive - meaning a string that matches one of them can't match any of the other) regexes, and execute a different piece of code depending on which one it matches. 我需要尝试一个字符串对多个(独占 - 意味着匹配其中一个的字符串不能匹配任何其他)正则表达式,并根据它匹配的代码执行不同的代码。 What I have currently is: 我现在拥有的是:

m = firstre.match(str)
if m:
    # Do something

m = secondre.match(str)
if m:
    # Do something else

m = thirdre.match(str)
if m:
    # Do something different from both

Apart from the ugliness, this code matches against all regexes even after it has matched one of them (say firstre), which is inefficient. 除了丑陋之外,这个代码与所有正则表达式匹配,即使它匹配其中一个(比如firstre),这是低效的。 I tried to use: 我试着用:

elif m = secondre.match(str)

but learnt that assignment is not allowed in if statements. 但是我知道if语句中不允许赋值。

Is there an elegant way to achieve what I want? 有没有一种优雅的方式来实现我想要的?

def doit( s ):

    # with some side-effect on a
    a = [] 

    def f1( s, m ):
        a.append( 1 )
        print 'f1', a, s, m

    def f2( s, m ):
        a.append( 2 )
        print 'f2', a, s, m

    def f3( s, m ):
        a.append( 3 )
        print 'f3', a, s, m

    re1 = re.compile( 'one' )
    re2 = re.compile( 'two' )
    re3 = re.compile( 'three' )


    func_re_list = (
        ( f1, re1 ), 
        ( f2, re2 ), 
        ( f3, re3 ),
    )
    for myfunc, myre in func_re_list:
        m = myre.match( s )
        if m:
            myfunc( s, m )
            break


doit( 'one' ) 
doit( 'two' ) 
doit( 'three' ) 

This might be a bit over engineering the solution, but you could combine them as a single regexp with named groups and see which group matched. 这可能有点过于设计解决方案,但您可以将它们组合为具有命名组的单个正则表达式,并查看匹配的组。 This could be encapsulated as a helper class: 这可以封装为辅助类:

import re
class MultiRe(object):
    def __init__(self, **regexps):
        self.keys = regexps.keys()
        self.union_re = re.compile("|".join("(?P<%s>%s)" % kv for kv in regexps.items()))

    def match(self, string, *args):
        result = self.union_re.match(string, *args)
        if result:
            for key in self.keys:
                if result.group(key) is not None:
                    return key

Lookup would be like this: 查找将是这样的:

multi_re = MultiRe(foo='fo+', bar='ba+r', baz='ba+z')
match = multi_re.match('baaz')
if match == 'foo':
     # one thing
elif match == 'bar':
     # some other thing
elif match == 'baz':
     # or this
else:
     # no match

对于未记录但非常有用的re.Scanner类,这是一个很好的应用程序。

A few ideas, none of them good necessarily, but it might fit your code well: 一些想法,其中没有一个是好的,但它可能适合您的代码:

How about putting the code in a separate function, ie MatchRegex() , which returns which regex it matched. 如何将代码放在一个单独的函数中,即MatchRegex() ,它返回匹配的正则表达式。 That way, inside the function, you can use a return after you matched the first (or second) regex, meaning you lose the inefficiency. 这样,在函数内部,您可以在匹配第一个(或第二个)正则表达式后使用返回,这意味着您将失去低效率。

Of course, you could always go with just nested if statements: 当然,你总是可以使用嵌套的if语句:

m = firstre.match(str)
if m:
   # Do something
else:
    m = secondre.match(str)
    ...

I really don't see any reason not to go with nested if s. 我真的没有任何理由不去嵌套if s。 They're very easy to understand and as efficient as you want. 它们非常容易理解,并且您可以随心所欲地使用它们。 I'd go for them just for their simplicity. 我只是为了他们的简单而去找他们。

You could use 你可以用

def do_first(str, res, actions):
  for re,action in zip(res, actions):
    m = re.match(str)
    if m:
      action(str)
      return

So, for example, say you've defined 所以,例如,假设您已定义

def do_something_1(str):
  print "#1: %s" % str

def do_something_2(str):
  print "#2: %s" % str

def do_something_3(str):
  print "#3: %s" % str

firstre  = re.compile("foo")
secondre = re.compile("bar")
thirdre  = re.compile("baz")

Then call it with 然后用它来调用它

do_first("baz",
         [firstre,        secondre,       thirdre],
         [do_something_1, do_something_2, do_something_3])

Early returns, perhaps? 也许是早期回归?

def doit(s):
    m = re1.match(s)
    if m:
        # Do something
        return

    m = re2.match(s)
    if m:
        # Do something else
        return

    ...

Ants Aasma's answer is good too. 蚂蚁Aasma的答案也很好。 If you prefer less scaffolding you can write that out yourself using the verbose regex syntax . 如果您更喜欢脚手架,可以使用详细的正则表达式语法自行编写。

re = re.compile(r'''(?x)    # set the verbose flag
    (?P<foo> fo+ )
  | (?P<bar> ba+r )
  | #...other alternatives...
''')

def doit(s):
    m = re.match(s)
    if m.group('foo'):
        # Do something
    elif m.group('bar'):
        # Do something else
    ...

I've done this a lot. 我做了很多。 It's fast and it works with re.finditer . 它很快,它与re.finditer一起re.finditer

Do it with an elif in case you just need a True/False out of regex matching: 如果您只需要正则表达式匹配的真/假,请使用elif:

if regex1.match(str):
    # do stuff
elif regex2.match(str):
    # and so on

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM