I'm using a pattern matching for string in Java. I have an issue, the CPU goes high and does nothing when trying to match the patterns. I have 100's of string which needs to be checked if it matches the 2 patterns.
Below is the sample code I'm using. It stops and CPU goes 100% for the first string (patternList) when matching it for the pattern 2 ie patternMatch[1]. How can I make this better?
String[] patternMatch = {"([\\w\\s]+)+([+\\-/*])+([\\w\\s]+)",
"([\\w\\s]+)+([+\\-/*])+([\\w\\s]+)+([+\\-/*])+([\\w\\s]+)"};
List<String> patternList = new ArrayList<String>();
patternList.add("Avg Volume Units product A + Volume Units product A");
patternList.add("Avg Volume Units / Volume Units product A");
patternList.add("Avg retailer On Hand / Volume Units Plan / Store Count");
patternList.add("Avg Hand Volume Units Plan Store Count");
patternList.add("1 - Avg merchant Volume Units");
patternList.add("Total retailer shipment Count");
for (String s :patternList ){
for(int i=0;i<patternMatch.length;i++){
Pattern pattern = Pattern.compile(patternMatch[i]);
Matcher matcher = pattern.matcher(s);
System.out.println(s);
if (matcher.matches()) {
System.out.println("Passed");
}else
System.out.println("Failed;");
}
}
It looks like you are facing variation of catastrophic backtracking probably caused by ([\\\\w\\\\s]+)+
. Try using ([\\\\w\\\\s]+)
instead
String[] patternMatch = {
"([\\w\\s]+)([+\\-/*])+([\\w\\s]+)",
"([\\w\\s]+)([+\\-/*])+([\\w\\s]+)([+\\-/*])+([\\w\\s]+)"
};
@Pshemo is probably right regarding the catastrophic backtracking. However, I would suggest a completely different approach using String.split()
and zero-with lookahead to match just before and after the operator ( +-*/
).
String[] x = s.split("((?<=[\\-\\+\\*/])|(?=[\\-\\+\\*/]))");
if (x.length == 3 || x.length== 5)
System.out.println("Passed");
else
System.out.println("Failed");
The split
returns an array containing the operators at odd offsets (1,3) and the strings between the operators at even offsets (0, 2 and 4). This should be much faster than a regex with backtracking.
I don't think there is a need to quantify a quantified unitary group.
Like this for example (?:(?:X)+)*
is simply equal to X*
The quantified unitary group causes exponential backtracking this way.
To use a model, this would be better (?:(?:X))*
which won't itself
cause catastrophic backtracking.
The other problem is you should try to refrain from grouping unitary
constructs altogether.
In your sample, the classes are each an example of a unitary (base) construct.
Also, use clustering (?:,,)
instead of capturing (,,)
if you can.
A construct like this ([+\\-/*])+
will match 1 to many of any of those char's
in that class, but will only capture the last character.
So, the capture group is of no real use as either a grouping nor a capture.
So, if you follow these rules, and keep the capture groups, the new regex's
would look like this:
# "([\\w\\s]+)([+\\-/*]+)([\\w\\s]+)"
( [\w\s]+ ) # (1)
( [+\-/*]+ ) # (2)
( [\w\s]+ ) # (3)
and
# "([\\w\\s]+)([+\\-/*]+)([\\w\\s]+)([+\\-/*]+)([\\w\\s]+)"
( [\w\s]+ ) # (1)
( [+\-/*]+ ) # (2)
( [\w\s]+ ) # (3)
( [+\-/*]+ ) # (4)
( [\w\s]+ ) # (5)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.