Please see my regular expression pattern code:
#!/usr/bin/env python
# -*- coding:utf-8 -*-
import re
print 'Start'
str1 = 'abcdefgasdsdfswossdfasdaef'
m = re.match(r"([A-Za-z\-\s\:\.]+)+(\d+)\w+", str1) # Want to match something like 'Moto 360x'
print m # None is expected.
print 'Done'
It takes 49 seconds to finish, any problem with the pattern?
See Runaway Regular Expressions: Catastrophic Backtracking .
In brief, if there are extremely many combinations a substring can be split into the parts of the regex, the regex matcher may end up trying them all.
Constructs like (x+)+
and x+x+
practically guarantee this behaviour.
To detect and fix the problematic constructs, the following concept can be used:
At conceptual level, the presence of a problematic construct means that your regex is ambiguous - ie if you disregard greedy/lazy behaviour, there's no single "correct" split of some text into the parts of the regex (or, equivalently, a subexpression thereof). So, to avoid/fix the problems, you need to see and eliminate all ambiguities.
One way to do this is to
Just repost the answer and solution in comments from nhahtdh and Marc B:
([A-Za-z\\-\\s\\:\\.]+)+
--> [A-Za-z\\-\\s\\:\\.]+
Thanks so much to nhahtdh and Marc B!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.