如何优化 Python 中的正则表达式匹配搜索

Question

The Program该程序

I am building a program that tracks which feature file steps are covered by a step definition.我正在构建一个程序来跟踪步骤定义涵盖了哪些功能文件步骤。 For example, I may have a feature step that is I should not click on the panel .例如，我可能有一个特征步骤是I should not click on the panel 。 This feature step matches the step definition I {qualifier} click on the {place} assuming that {qualifier} maps to (should not|should) and {place} maps to (panel|page) .此功能步骤与步骤定义匹配I {qualifier} click on the {place}假设{qualifier}映射到(should not|should)并且{place}映射到(panel|page) 。

For every feature step that matches a step definition, I want to keep track of what step definition it actually matched with.对于匹配步骤定义的每个特征步骤，我想跟踪它实际匹配的步骤定义。 So I need to have a connection between I should not click on the panel and I {qualifier} click on the {place} .所以我需要在I should not click on the panel和I {qualifier} click on the {place}之间建立联系。

And for every feature step that does not match any of the step definitions, then I am going to generate a step definition and connect those two.对于与任何步骤定义都不匹配的每个特征步骤，我将生成一个步骤定义并将这两者连接起来。

The Problem问题

Right now I take every step definition and convert them into a regular expression, something like...现在我把每一个步骤定义并把它们转换成一个正则表达式，比如……

I {qualifier} click on the {place} will be converted to (I (should not|should) click on the (panel|page)) I {qualifier} click on the {place}将被转换为(I (should not|should) click on the (panel|page))

I am using a Python dictionary where the key is the converted regular expression and the value is the original step definition.我正在使用 Python 字典，其中键是转换后的正则表达式，值是原始步骤定义。

My problem arises when I am going through every single feature step and trying to connect them to their matching step definitions.当我经历每一个特征步骤并尝试将它们连接到它们匹配的步骤定义时，我的问题就出现了。 I am currently just looping through every single regular expression and trying to match it with the feature step, something like this...我目前只是循环遍历每个正则表达式并尝试将其与特征步骤相匹配，就像这样......

# every feature_step gets sent through this check

for regex in all_step_definition_regex:
    if re.match(regex, feature_step):
        step_definition = regex_to_step_definition_map[regex]
        return True, step_definition

return False, None

This is taking an incredibly long time to run when every feature step has to be checked to see if it matches any of the individual regular expressions.当必须检查每个特征步骤以查看它是否与任何单个正则表达式匹配时，这将花费非常长的时间来运行。 One way to speed up the initial check is to join every regular expression together with an 'or' like re.match('|'.join(all_step_definition_regex), feature_step) , but then I have no way to connect the feature step with it's matching step definition without looping back through all the individual regular expressions.加快初始检查的一种方法是将每个正则表达式与re.match('|'.join(all_step_definition_regex), feature_step)之类的“或”连接在一起，但是我无法将特征步骤与其连接起来匹配步骤定义而不循环回所有单独的正则表达式。

I was wondering if anyone has any idea how to speed up this process?我想知道是否有人知道如何加快这个过程？

Answer 1

You can make each definition pattern a group, and then see which group matched, although you'll need to change your individual regexs to use non-capturing groups (?:) (which would be more efficient in any case, if you're not using the information):您可以将每个定义模式作为一个组，然后查看哪个组匹配，尽管您需要更改您的个人正则表达式以使用非捕获组 (?:)（如果您是，这在任何情况下都会更有效率不使用信息）：

definition_regex = re.compile(r'(' + r')|('.join(all_step_definition_regex) + r')')

def find_definition(feature_step):
    match = definition_regex.match(feature_step)
    if match is None:
        return None
    return match.lastindex - 1

如何优化 Python 中的正则表达式匹配搜索

问题描述

The Program该程序

The Problem问题

1 个解决方案

解决方案1
0 2022-05-03 04:50:49

如何优化 Python 中的正则表达式匹配搜索

问题描述

The Program该程序

The Problem问题

1 个解决方案

解决方案1 0 2022-05-03 04:50:49

解决方案1
0 2022-05-03 04:50:49