简体   繁体   English

Java标识符中的“连接字符”是什么?

[英]What are “connecting characters” in Java identifiers?

I am reading for SCJP and I have a question regarding this line: 我正在阅读SCJP,我对这一行有疑问:

Identifiers must start with a letter, a currency character ($), or a connecting character such as the underscore ( _ ). 标识符必须以字母,货币字符($)或连接字符(如下划线(_))开头。 Identifiers cannot start with a number! 标识符不能以数字开头!

It states that a valid identifier name can start with a connecting character such as underscore. 它声明有效的标识符名称可以以连接字符(如下划线)开头。 I thought underscores were the only valid option? 我认为下划线是唯一有效的选择吗? What other connecting characters are there? 还有其他什么连接字符

Here is a list of connecting characters. 这是一个连接字符列表。 These are characters used to connect words. 这些是用于连接单词的字符。

http://www.fileformat.info/info/unicode/category/Pc/list.htm http://www.fileformat.info/info/unicode/category/Pc/list.htm

U+005F _ LOW LINE
U+203F ‿ UNDERTIE
U+2040 ⁀ CHARACTER TIE
U+2054 ⁔ INVERTED UNDERTIE
U+FE33 ︳ PRESENTATION FORM FOR VERTICAL LOW LINE
U+FE34 ︴ PRESENTATION FORM FOR VERTICAL WAVY LOW LINE
U+FE4D ﹍ DASHED LOW LINE
U+FE4E ﹎ CENTRELINE LOW LINE
U+FE4F ﹏ WAVY LOW LINE
U+FF3F _ FULLWIDTH LOW LINE

This compiles on Java 7. 这在Java 7上编译。

int _, ‿, ⁀, ⁔, ︳, ︴, ﹍, ﹎, ﹏, _;

An example. 一个例子。 In this case tp is the name of a column and the value for a given row. 在这种情况下, tp是列的名称和给定行的值。

Column<Double> ︴tp︴ = table.getColumn("tp", double.class);

double tp = row.getDouble(︴tp︴);

The following 下列

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++)
    if (Character.isJavaIdentifierStart(i) && !Character.isAlphabetic(i))
        System.out.print((char) i + " ");
}

prints 版画

$ _ ¢ £ ¤ ¥ ؋ ৲ ৳ ৻ ૱ ௹ ฿ ៛ ‿ ⁀ ⁔ ₠ ₡ ₢ ₣ ₤ ₥ ₦ ₧ ₨ ₩ ₪ ₫ € ₭ ₮ ₯ ₰ ₱ ₲ ₳ ₴ ₵ ₶ ₷ ₸ ₹ ꠸ ﷼ ︳ ︴ ﹍ ﹎ ﹏ ﹩ $ _ ¢ £ ¥ ₩ $ _¢¥¥؋৲৳૱฿฿₣₤₥₨₨₨₩₩₩₫₰₰₰₸₸₸₸₸₸﷼﷼﷼﷼﷼﹎﹎﹎ $ $ _¢£¥₩

iterate through the whole 65k chars and ask Character.isJavaIdentifierStart(c) . 遍历整个65k字符并询问Character.isJavaIdentifierStart(c) The answer is : "undertie" decimal 8255 答案是:“承诺”小数8255

可以在Java语言规范中找到合法Java标识符的权威规范

Here is a List of connector Characters in Unicode. 这是 Unicode中的连接器字符列表 You will not find them on your keyboard. 您将无法在键盘上找到它们。

U+005F LOW LINE _ U + 005F LOW LINE _
U+203F UNDERTIE ‿ U + 203FUNDERTIE‿
U+2040 CHARACTER TIE ⁀ U + 2040 CHARACTERTIE⁀
U+2054 INVERTED UNDERTIE ⁔ U + 2054倒置在⁔下⁔
U+FE33 PRESENTATION FORM FOR VERTICAL LOW LINE ︳ U + FE33垂直低线的演示形式_
U+FE34 PRESENTATION FORM FOR VERTICAL WAVY LOW LINE ︴ U + FE34垂直波浪低线的演示形式︴
U+FE4D DASHED LOW LINE ﹍ U + FE4D DASHED LOWLINE﹍
U+FE4E CENTRELINE LOW LINE ﹎ U + FE4E CENTRELINE LOWLINE﹎
U+FE4F WAVY LOW LINE ﹏ U + FE4F WAVY LOWLINE﹏
U+FF3F FULLWIDTH LOW LINE _ U + FF3F全线低线_

A connecting character is used to connect two characters. 连接字符用于连接两个字符。

In Java, a connecting character is the one for which Character.getType(int codePoint) / Character.getType(char ch) returns a value equal to Character.CONNECTOR_PUNCTUATION . 在Java中,连接字符是Character.getType(int codePoint) / Character.getType(char ch)返回的值等于Character.CONNECTOR_PUNCTUATION的字符

Note that in Java, the character information is based on Unicode standard which identifies connecting characters by assigning them the general category Pc, which is an alias for Connector_Punctuation . 请注意,在Java中,字符信息基于Unicode标准,该标准通过为连接字符分配通用类别Pc来标识连接字符,该类别是Connector_Punctuation的别名。

The following code snippet, 以下代码段,

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++) {
    if (Character.getType(i) == Character.CONNECTOR_PUNCTUATION
            && Character.isJavaIdentifierStart(i)) {
        System.out.println("character: " + String.valueOf(Character.toChars(i))
                + ", codepoint: " + i + ", hexcode: " + Integer.toHexString(i));
    }
}

prints the connecting characters that can be used to start an identifer on jdk1.6.0_45 打印可用于在jdk1.6.0_45上启动标识符的连接字符

character: _, codepoint: 95, hexcode: 5f
character: ‿, codepoint: 8255, hexcode: 203f
character: ⁀, codepoint: 8256, hexcode: 2040
character: ⁔, codepoint: 8276, hexcode: 2054
character: ・, codepoint: 12539, hexcode: 30fb
character: ︳, codepoint: 65075, hexcode: fe33
character: ︴, codepoint: 65076, hexcode: fe34
character: ﹍, codepoint: 65101, hexcode: fe4d
character: ﹎, codepoint: 65102, hexcode: fe4e
character: ﹏, codepoint: 65103, hexcode: fe4f
character: _, codepoint: 65343, hexcode: ff3f
character: ・, codepoint: 65381, hexcode: ff65

The following compiles on jdk1.6.0_45, 以下编译在jdk1.6.0_45上,

int _, ‿, ⁀, ⁔, ・, ︳, ︴, ﹍, ﹎, ﹏, _, ・ = 0;

Apparently, the above declaration fails to compile on jdk1.7.0_80 & jdk1.8.0_51 for the following two connecting characters (backward compatibility...oops!!!), 显然,上面的声明无法在jdk1.7.0_80和jdk1.8.0_51上为以下两个连接字符编译(向后兼容性... oops !!!),

character: ・, codepoint: 12539, hexcode: 30fb
character: ・, codepoint: 65381, hexcode: ff65

Anyway, details aside, the exam focuses only on the Basic Latin character set . 无论如何,除了细节, 考试只关注Basic Latin字符集

Also, for legal identifers in Java, the spec is provided here . 此外,对于Java中的合法标识, 此处提供了规范。 Use the Character class APIs to get more details. 使用Character类API获取更多详细信息。

One of the most, well, fun characters that is allowed in Java identifiers (however not at the start) is the unicode character named "Zero Width Non Joiner" (&zwnj;, U+200C, https://en.wikipedia.org/wiki/Zero-width_non-joiner ). Java标识符中允许的最有趣,最有趣的字符之一(但不是一开始)是名为“Zero Width Non Joiner”的unicode字符(&zwnj;,U + 200C, https ://en.wikipedia.org / wiki / Zero-width_non-joiner )。

I had this once in a piece of XML inside an attribute value holding a reference to another piece of that XML. 我曾经在一个XML中使用过一段XML,该属性值包含对另一段XML的引用。 Since the ZWNJ is "zero width" it cannot be seen (except when walking along with the cursor, it is displayed right on the character before). 由于ZWNJ是“零宽度”,因此无法看到(除非与光标一起行走,否则它将显示在角色之前)。 It also couldn't be seen in the logfile and/or console output. 它也无法在日志文件和/或控制台输出中看到。 But it was there all the time: copy & paste into search fields got it and thus did not find the referred position. 但它一直存在:复制并粘贴到搜索字段中得到它,因此没有找到引用的位置。 Typing the (visible part of the) string into the search field however found the referred position. 然而,在搜索字段中键入(可见部分)字符串可以找到引用的位置。 Took me a while to figure this out. 我花了一段时间才弄明白这一点。

Typing a Zero-Width-Non-Joiner is actually quite easy (too easy) when using the European keyboard layout, at least in its German variant, eg "Europatastatur 2.02" - it is reachable with AltGr + ".", two keys which unfortunately are located directly next to each other on most keyboards and can easily be hit together accidentally. 使用欧洲键盘布局时,键入零宽度非连接器实际上非常容易(太简单),至少在其德语版本中,例如“Europatastatur 2.02” - 可以使用AltGr +“。”来访问它,两个键是不幸的是,它们在大多数键盘上直接相邻,很容易被意外地撞到一起。

Back to Java: I thought well, you could write some code like this: 回到Java:我想,你可以写一些这样的代码:

void foo() {
    int i = 1;
    int i‌ = 2;
}

with the second i appended by a zero-width-non-joiner (can't do that in the above code snipped in stackoverflow's editor), but that didn't work. 第二个i附加一个零宽度非连接符(在上面的代码中不能在stackoverflow的编辑器中剪切),但这不起作用。 IntelliJ (16.3.3) did not complain, but JavaC (Java 8) did complain about an already defined identifier - it seems JavaC actually allows the ZWNJ character as part of an identifier, but when using reflection to see what it does, the ZWNJ character is stripped off the identifier - something that characters like ‿ aren't. IntelliJ(16.3.3)没有抱怨,但JavaC(Java 8)确实抱怨已经定义的标识符 - 似乎JavaC实际上允许ZWNJ字符作为标识符的一部分,但是当使用反射来查看它的作用时,ZWNJ字符被剥离标识符 - 像‿这样的字符不是。

The list of characters you can use inside your identifiers (rather than just at the start) is much more fun: 您可以将您的标识符(而不是仅仅在开始) 使用的字符的列表是有趣:

for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++)
    if (Character.isJavaIdentifierPart(i) && !Character.isAlphabetic(i))
        System.out.print((char) i + " ");

The list is: 清单是:

I wanted to post the output, but it's forbidden by the SO spam filter. That's how fun it is!

It includes most of the control characters! 它包括大多数控制角色! I mean bells and shit! 我的意思是钟声和狗屎! You can make your source code ring the fn bell! 你可以让你的源代码响铃! Or use characters which will only be displayed sometimes, like the soft hyphen. 或者使用有时仅显示的字符,如软连字符。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM