String:
Aqua, Sodium Laureth Sulfate, Sodium Lauryl Sulfate, Dimethicone, Cocamide MEA, Zinc Carbonate, Glycol Distearate, Sodium Chloride, Zinc Pyrithione, Sodium Xylenesulfonate, Cetyl Alcohol, Parfum, Guar Hydroxypropyltrimonium Chloride, Magnesium Sulfate, Sodium Benzoate, Ammonium Laureth Sulfate, Magnesium Carbonate Hydroxide, Linalool, Butylphenyl Methylpropional, Limonene, Hydroxyisohexyl 3-Cyclohexene Carboxaldehyde, Benzyl Alcohol, Hexyl Cinnamal, Citronellol, Tocopheryl Acetate, Paraffinum Liquidum, Sodium Polynaphthalenesulfonate, CI 19140, DMDM Hydantoin, CI 15510, Methylchloroisothiazolinone, Disodium EDTA, Tetrasodium EDTA, Methylisothiazolinone.
Current Regex:
System.out.println(string.matches("([\\W]*\\b[A-Z\\d]\\w+\\b[\\W]*)+"));
Java application hangs up. I can't find the error in the RegEx. By googeling I found out that this could be called "catastrophic backtracking"!? The Regex should match the String if it only contains uppercase words if for example 1 word is lower case in should not match it.
I recommend you split your input string by word and then pattern match it, event simpler: not to pattern match if you just want to test that the first letter of each word is uppercase, like:
for (String s : string.split("\\W")) {
if (s.charAt(0) < 'A' || s.charAt(0) > 'Z') {
return false;
}
}
Sounds a lot faster to me (and you can even have the word that failed if you need).
Perhaps what you had in mind was
String regex = "([A-Z][\\d\\w]+( [A-Z][-\\d\\w]+)*, )*[A-Z][-\\d\\w]+( [A-Z][-\\d\\w]+)*\\.";
System.out.println(string.matches(regex));
returns true.
The problem you have with the regex is that its overly complicated. The disadvantage with adding expressions until you get true
is that it can match things you didn't have in mind.
Random rand = new Random();
while(true) {
byte[] bytes = new byte[40];
rand.nextBytes(bytes);
for (int i = 0; i < bytes.length; i++) bytes[i] &= 0x7F;
String string = new String(bytes, 0);
if (string.matches("([\\W]*\\b[A-Z\\d]\\w+\\b[\\W\\d]*)+"))
System.out.println(string);
}
prints things such as
"^;%XX`'SwJ|[*4"*0C<Tgbom_. \^
{PvU_y9aJSm?08EL( NpfA9a[:$YbN8VTtMk
;![`LR7Yy\AO5PZ@X4}GajC<*XvKE11
8l5W6*IDNH[9C'@.>7`LHsCN*,{26O}
EFJ5MBVxi%W_t6v54EmLmgjFvqyYh\<"
+7]|ULh2[MT`Yx{MKH4N
'8p!2mf
whereas the expression I gave matches
KfhBuGv7, S3.
IWzu, XHop4Z.
LJbXfrd, PdR.
V2dxQV, LA9z.
HKf37cy0, TS.
RAw2E5a, Ajs.
Up-, GPQ7 I_.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.