为什么我的Python正则表达式模式运行得这么慢？

Question

Please see my regular expression pattern code: 请查看我的正则表达式模式代码：

#!/usr/bin/env python
# -*- coding:utf-8 -*-

import re

print 'Start'
str1 = 'abcdefgasdsdfswossdfasdaef'
m = re.match(r"([A-Za-z\-\s\:\.]+)+(\d+)\w+", str1) # Want to match something like 'Moto 360x'
print m # None is expected.
print 'Done'

It takes 49 seconds to finish, any problem with the pattern? 完成需要49秒，模式有什么问题吗？

Answer 1

See Runaway Regular Expressions: Catastrophic Backtracking . 请参阅失控正则表达式：灾难性回溯。

In brief, if there are extremely many combinations a substring can be split into the parts of the regex, the regex matcher may end up trying them all. 简而言之，如果有很多组合，子字符串可以拆分为正则表达式的各个部分，则正则表达式匹配器最终可能会尝试全部。

Constructs like (x+)+ and x+x+ practically guarantee this behaviour. 像(x+)+和x+x+这样的构造实际上保证了这种行为。

To detect and fix the problematic constructs, the following concept can be used: 要检测并修复有问题的构造，可以使用以下概念：

At conceptual level, the presence of a problematic construct means that your regex is ambiguous - ie if you disregard greedy/lazy behaviour, there's no single "correct" split of some text into the parts of the regex (or, equivalently, a subexpression thereof). 在概念层面，有问题的构造的存在意味着你的正则表达式是模糊的 - 即如果你忽略了贪婪/懒惰的行为，那么某些文本没有单独的“正确”分割成正则表达式的部分 （或者，等效地，其子表达式））。 So, to avoid/fix the problems, you need to see and eliminate all ambiguities. 因此，为避免/解决问题，您需要查看并消除所有歧义。
- One way to do this is to 一种方法是
  - always split the text into its meaningful parts (=parts that have separate meanings for the task at hand), and 总是将文本分成有意义的部分 （=对当前任务具有不同含义的部分），并且
  - define the parts in such a way that they cannot be confused (=using the same characteristics that you yourself would use to tell which is which if you were parsing it by hand) 以这样的方式定义部件，使它们不会混淆（=使用你自己用来判断哪个特性，如果你手工解析它）

Answer 2

Just repost the answer and solution in comments from nhahtdh and Marc B: 只需在nhahtdh和Marc B的评论中重新发布答案和解决方案：

([A-Za-z\\-\\s\\:\\.]+)+ --> [A-Za-z\\-\\s\\:\\.]+ ([A-Za-z\\-\\s\\:\\.]+)+ -> [A-Za-z\\-\\s\\:\\.]+

Thanks so much to nhahtdh and Marc B! 非常感谢nhahtdh和Marc B！

为什么我的Python正则表达式模式运行得这么慢？

问题描述

2 个解决方案

解决方案1
7 已采纳 2014-12-12 16:54:33

解决方案2
0 2014-12-12 16:51:32

为什么我的Python正则表达式模式运行得这么慢？

问题描述

2 个解决方案

解决方案1 7 已采纳 2014-12-12 16:54:33

解决方案2 0 2014-12-12 16:51:32

解决方案1
7 已采纳 2014-12-12 16:54:33

解决方案2
0 2014-12-12 16:51:32