What does this regex syntax actually mean in Java?

Question

I wrote a program to detect palindromes. It works with what I have, but I stumbled upon another bit of syntax, and I would like to know what it means exactly?

This is the line of code I'm using:

    userString = userString.toLowerCase().replaceAll("[^a-zA-Z]", "");

I understand that the replaceAll code snippet means to "match characters ([...]) that are not (^) in the range az and AZ (a-zA-Z)."

However, this worked as well:

    replaceAll("[^(\p{L}')]", "");

I just don't understand how to translate that into English. I am completely new to regular expressions, and I find them quite fascinating. Thanks to anyone who can tell me what it means.

Answer 1

You should check this website: https://regex101.com

It helped me a lot when I was writing/testing/debugging some regexes ;)

It gives the following explanation:

[^(\\p{L}')] match a single character not present in the list below:

( the literal character (
\\p{L} matches any kind of letter from any language
') a single character in the list ') literally

Answer 2

The two regexes are not the same:

[^a-zA-Z] matches any char not an English letter
[^(\\p{L}')] matches any char not a letter, quote or bracket

ie the 2nd one removes brackets and quotes too.

The regex \\p{L} is the posix character class for "any letter". IE these two regexes are equivalent in the context of letters only from English:

[a-zA-Z]
\\p{L}

What does this regex syntax actually mean in Java?

Question

2 answers

solution1
2 ACCPTED 2015-10-11 03:55:52

solution2
-1 2015-10-11 03:55:36

What does this regex syntax actually mean in Java?

Question

2 answers

solution1 2 ACCPTED 2015-10-11 03:55:52

solution2 -1 2015-10-11 03:55:36

solution1
2 ACCPTED 2015-10-11 03:55:52

solution2
-1 2015-10-11 03:55:36