简体   繁体   中英

java valid identifier from java language specification

Many places on SO lead to the JLS section on Identifiers , but I have a question on what's written there.

The "Java letters" include uppercase and lowercase ASCII Latin letters AZ (\A-\Z), and az (\a-\z), and, for historical reasons, the ASCII underscore (_, or \_) and dollar sign ($, or \$). The $ character should be used only in mechanically generated source code or, rarely, to access pre-existing names on legacy systems. The "Java digits" include the ASCII digits 0-9 (\0-\9).

But it goes on to say:

Letters and digits may be drawn from the entire Unicode character set, which supports most writing scripts in use in the world today, including the large sets for Chinese, Japanese, and Korean. This allows programmers to use identifiers in their programs that are written in their native languages.

I don't understand how these can both be true. The first section seems to dictate exactly which characters are allowed whereas the second section seems to say that the allowance is much more flexible.

I agree that usage of "includes" instead of "includes but is not limited to" shows that it doesn't exactly contradict. But it also first refers specifically to "Java letters"/"Java digits" and then relaxes this to just "letters"/"digits". My main point is lack of clarity and I wanted confirmation on what I assumed it meant.

As per the question Legal identifiers in Java you can see that there are many legal identifiers.

[For languages using the roman alphabet] only alphanumeric characters and occasionally underscores are used when naming identifiers by convention . However, a vast array of characters can be used.

The first paragraph refers to the code-style, or convention, among java programmers to use a reasonably consistent and readable naming scheme. The second paragraph you've quoted explains that there are a vast array of other characters which the JVM will accept - although your fellow programmers may disapprove.

First section is a special case of the second, and characters mentioned in both the sections have to satisfy the criteria mentioned in JLS 3.8 that is missed here,

A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true. A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true.

The above methods accept/verify the code points that correspond to the characters in the entire Unicode character set (Section 2) which includes the Basic-Latin character set (Section 1).

Usually, you will never see anybody going beyond the Basic-Latin character set in their Java source files.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM