How to generate Unicode “Immutable Identifiers” chars in Java?

Question

I am trying to validate if a dependency can work with some specific unicode chars called Immutable identifier : http://www.unicode.org/reports/tr31/#Immutable_Identifier_Syntax

The defintion of "Immutable identifier" chars is

Immutable Identifiers: To meet this requirement, an implementation shall define identifiers to be any non-empty string of characters that contains no character having any of the following property values:

Pattern_White_Space=True
Pattern_Syntax=True
General_Category=Private_Use, Surrogate, or Control
Noncharacter_Code_Point=True

I am able to figure out what's Surrogate , PRIVATE_USE and Control chars in https://docs.oracle.com/javase/7/docs/api/java/lang/Character.html but unable to find the rest. This Unicode doc is also kinda complex to me so I failed to understand it and found the codepoint range for those "immutable identifier" chars:(. can anyone with some context shine some light?

Answer 1

Start with the javadoc of Pattern , especially the (Unicode) classes table. But it also contains Unicode reference links.

"\\p{Space}"   // Whitespace
"\\p{Punct}"   // Interpunction
"\\p{M}"       // Combined diacritical marks, zero-width accents

And more.

Furthermore you might want to normalize the identifier. "é" can be written as one Unicode code point, or two code points: a latin e and a zero-width accent. java.text.Normalizer can do that. Compressed (one code point) seems best.

Please take a look at the UAX .

"\\p{Pattern_Syntax}"

Not sure but Pattern_Syntax chars probably contain []?+*. , so I would think Interpunction would do too.

How to generate Unicode “Immutable Identifiers” chars in Java?

Question

1 answers

solution1
0 2021-03-24 08:46:52

How to generate Unicode “Immutable Identifiers” chars in Java?

Question

1 answers

solution1 0 2021-03-24 08:46:52

solution1
0 2021-03-24 08:46:52