简体   繁体   中英

Regular expression hangs - Java matcher

String:

Aqua, Sodium Laureth Sulfate, Sodium Lauryl Sulfate, Dimethicone, Cocamide MEA, Zinc Carbonate, Glycol Distearate, Sodium Chloride, Zinc Pyrithione, Sodium Xylenesulfonate, Cetyl Alcohol, Parfum, Guar Hydroxypropyltrimonium Chloride, Magnesium Sulfate, Sodium Benzoate, Ammonium Laureth Sulfate, Magnesium Carbonate Hydroxide, Linalool, Butylphenyl Methylpropional, Limonene, Hydroxyisohexyl 3-Cyclohexene Carboxaldehyde, Benzyl Alcohol, Hexyl Cinnamal, Citronellol, Tocopheryl Acetate, Paraffinum Liquidum, Sodium Polynaphthalenesulfonate, CI 19140, DMDM Hydantoin, CI 15510, Methylchloroisothiazolinone, Disodium EDTA, Tetrasodium EDTA, Methylisothiazolinone.

Current Regex:

System.out.println(string.matches("([\\W]*\\b[A-Z\\d]\\w+\\b[\\W]*)+"));

Java application hangs up. I can't find the error in the RegEx. By googeling I found out that this could be called "catastrophic backtracking"!? The Regex should match the String if it only contains uppercase words if for example 1 word is lower case in should not match it.

I recommend you split your input string by word and then pattern match it, event simpler: not to pattern match if you just want to test that the first letter of each word is uppercase, like:

for (String s : string.split("\\W")) {
  if (s.charAt(0) < 'A' || s.charAt(0) > 'Z') {
    return false;
  }
}

Sounds a lot faster to me (and you can even have the word that failed if you need).

Perhaps what you had in mind was

String regex = "([A-Z][\\d\\w]+( [A-Z][-\\d\\w]+)*, )*[A-Z][-\\d\\w]+( [A-Z][-\\d\\w]+)*\\.";
System.out.println(string.matches(regex));

returns true.

The problem you have with the regex is that its overly complicated. The disadvantage with adding expressions until you get true is that it can match things you didn't have in mind.

Random rand = new Random();
while(true) {
    byte[] bytes = new byte[40];
    rand.nextBytes(bytes);
    for (int i = 0; i < bytes.length; i++) bytes[i] &= 0x7F;
    String string = new String(bytes, 0);
    if (string.matches("([\\W]*\\b[A-Z\\d]\\w+\\b[\\W\\d]*)+"))
        System.out.println(string);
}

prints things such as

"^;%XX`'SwJ|[*4"*0C<Tgbom_. \^
{PvU_y9aJSm?08EL(   NpfA9a[:$YbN8VTtMk
;![`LR7Yy\AO5PZ@X4}GajC<*XvKE11
8l5W6*IDNH[9C'@.>7`LHsCN*,{26O}
EFJ5MBVxi%W_t6v54EmLmgjFvqyYh\<"
+7]|ULh2[MT`Yx{MKH4N
'8p!2mf

whereas the expression I gave matches

KfhBuGv7, S3.
IWzu, XHop4Z.
LJbXfrd, PdR.
V2dxQV, LA9z.
HKf37cy0, TS.
RAw2E5a, Ajs.
Up-, GPQ7 I_.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM