简体   繁体   中英

What does this regex syntax actually mean in Java?

I wrote a program to detect palindromes. It works with what I have, but I stumbled upon another bit of syntax, and I would like to know what it means exactly?

This is the line of code I'm using:

    userString = userString.toLowerCase().replaceAll("[^a-zA-Z]", "");

I understand that the replaceAll code snippet means to "match characters ([...]) that are not (^) in the range az and AZ (a-zA-Z)."

However, this worked as well:

    replaceAll("[^(\p{L}')]", "");

I just don't understand how to translate that into English. I am completely new to regular expressions, and I find them quite fascinating. Thanks to anyone who can tell me what it means.

You should check this website: https://regex101.com

It helped me a lot when I was writing/testing/debugging some regexes ;)

It gives the following explanation:

[^(\\p{L}')] match a single character not present in the list below:

  • ( the literal character (
  • \\p{L} matches any kind of letter from any language
  • ') a single character in the list ') literally

The two regexes are not the same:

  • [^a-zA-Z] matches any char not an English letter
  • [^(\\p{L}')] matches any char not a letter, quote or bracket

ie the 2nd one removes brackets and quotes too.

The regex \\p{L} is the posix character class for "any letter". IE these two regexes are equivalent in the context of letters only from English:

  • [a-zA-Z]
  • \\p{L}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM