[英]Regex: How to get span or group of *pattern* which matches input string
Note - This question is similar to this and this but I was unable to resolve my problem based on those answers.注意 - 这个问题与这个和这个类似,但我无法根据这些答案解决我的问题。
I have a list of patterns list_patterns
and I want an efficient way to search for a match against an input_string
, so I join all of the patterns together (will be much more efficient than looping through all of the patterns and checking for a match).我有一个模式列表list_patterns
并且我想要一种有效的方法来搜索与input_string
的匹配,因此我将所有模式连接在一起(这将比遍历所有模式并检查匹配更有效)。 However, I am not so much interested in the existence of the match as much as which pattern matches my input string.但是,我对匹配的存在并不感兴趣,而是对哪种模式与我的输入字符串匹配。 The below code illustrates what I want:下面的代码说明了我想要的:
import re
input_string = 'foobar 11 the'
list_patterns = ['^foobar \d+$','^foobar [a-z]+$','^foobar \d+ [a-z]+$']
joined_patterns = r'|'.join(list_patterns)
print(joined_patterns)
# OUT: ^foobar \d+$|^foobar [a-z]+$|^foobar \d+ [a-z]+$
compiled_patterns = re.compile(joined_patterns)
print(compiled_patterns.search(input_string).span())
# OUT: (0,13)
# Desired method returns the third pattern (index 2)
print(compiled_patterns.search(input_string).pattern_group())
# OUT: 2
Group the patterns, find which group is not empty.对模式进行分组,找出哪个组不为空。
import re
input_string = 'foobar 11 the'
list_patterns = ['^foobar \d+$','^foobar [a-z]+$','^foobar \d+ [a-z]+$']
joined_patterns = '(' + r')|('.join(list_patterns) + ')'
compiled_patterns = re.compile(joined_patterns)
print(compiled_patterns)
# (^foobar \d+$)|(^foobar [a-z]+$)|(^foobar \d+ [a-z]+$)
match = compiled_patterns.match(input_string)
i = next(i for i, g in enumerate(match.groups()) if g is not None)
matching_pattern = list_patterns[i]
print(matching_pattern)
# ^foobar \d+ [a-z]+$
You could encapsulate your logic in a small class:您可以将您的逻辑封装在一个小的 class 中:
import re
input_string = 'foobar 11 the'
class MatchPattern:
list_patterns = [r'^foobar \d+$', r'^foobar [a-z]+$', r'^foobar \d+ [a-z]+$']
joined_patterns = ''
def __init__(self):
joined = "|".join(rf"(?P<group_{idx}>{pattern})" for idx, pattern in enumerate(self.list_patterns))
self.joined_patterns = re.compile(joined)
def match(self, string):
m = self.joined_patterns.search(string)
if m:
group = [name for name, value in m.groupdict().items() if value][0]
_, idx = group.split("_")
return (group, self.list_patterns[int(idx)])
else:
return (None)
mp = MatchPattern()
group = mp.match(input_string)
print(group)
# ('group_2', '^foobar \\d+ [a-z]+$')
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.