简体   繁体   English

正则表达式:如何获取与输入字符串匹配的 *pattern* 的跨度或组

[英]Regex: How to get span or group of *pattern* which matches input string

Note - This question is similar to this and this but I was unable to resolve my problem based on those answers.注意 - 这个问题与这个这个类似,但我无法根据这些答案解决我的问题。

I have a list of patterns list_patterns and I want an efficient way to search for a match against an input_string , so I join all of the patterns together (will be much more efficient than looping through all of the patterns and checking for a match).我有一个模式列表list_patterns并且我想要一种有效的方法来搜索与input_string的匹配,因此我将所有模式连接在一起(这将比遍历所有模式并检查匹配更有效)。 However, I am not so much interested in the existence of the match as much as which pattern matches my input string.但是,我对匹配的存在并不感兴趣,而是对哪种模式与我的输入字符串匹配。 The below code illustrates what I want:下面的代码说明了我想要的:

import re
input_string = 'foobar 11 the'
list_patterns = ['^foobar \d+$','^foobar [a-z]+$','^foobar \d+ [a-z]+$']
joined_patterns = r'|'.join(list_patterns)

print(joined_patterns)
# OUT:  ^foobar \d+$|^foobar [a-z]+$|^foobar \d+ [a-z]+$

compiled_patterns = re.compile(joined_patterns)

print(compiled_patterns.search(input_string).span())
# OUT: (0,13)

# Desired method returns the third pattern (index 2)
print(compiled_patterns.search(input_string).pattern_group())
# OUT: 2

Group the patterns, find which group is not empty.对模式进行分组,找出哪个组不为空。

import re

input_string = 'foobar 11 the'
list_patterns = ['^foobar \d+$','^foobar [a-z]+$','^foobar \d+ [a-z]+$']

joined_patterns = '(' + r')|('.join(list_patterns) + ')'
compiled_patterns = re.compile(joined_patterns)

print(compiled_patterns)
# (^foobar \d+$)|(^foobar [a-z]+$)|(^foobar \d+ [a-z]+$)

match = compiled_patterns.match(input_string)

i = next(i for i, g in enumerate(match.groups()) if g is not None)
matching_pattern = list_patterns[i]

print(matching_pattern)
# ^foobar \d+ [a-z]+$

You could encapsulate your logic in a small class:您可以将您的逻辑封装在一个小的 class 中:

import re

input_string = 'foobar 11 the'

class MatchPattern:
    list_patterns = [r'^foobar \d+$', r'^foobar [a-z]+$', r'^foobar \d+ [a-z]+$']
    joined_patterns = ''

    def __init__(self):
        joined = "|".join(rf"(?P<group_{idx}>{pattern})" for idx, pattern in enumerate(self.list_patterns))
        self.joined_patterns = re.compile(joined)

    def match(self, string):
        m = self.joined_patterns.search(string)
        if m:
            group = [name for name, value in m.groupdict().items() if value][0]
            _, idx = group.split("_")
            return (group, self.list_patterns[int(idx)])
        else:
            return (None)

mp = MatchPattern()
group = mp.match(input_string)
print(group)
# ('group_2', '^foobar \\d+ [a-z]+$')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM