[英]What's an “ignorable character in a Java identifier”
I stumbled across this doc and wondered what that was all about. 我偶然发现了这个文档 ,想知道这是怎么回事。 Apparently you can have certain control characters inside identifiers and they are ignored:
显然,标识符中可以包含某些控制字符,它们将被忽略:
public static void main(String[] args) throws Exception {
int dummy = 123;
System.out.println(dummy); // Has U+200B after the `d` before the `u`
}
I couldn't find anything about this in the JLS. 我在JLS中找不到有关此的任何信息。 IntelliJ IDEA gives an error in the editor saying "dummy" is an undeclared identifier (but nevertheless it compiles and runs).
IntelliJ IDEA在编辑器中给出错误,指出“虚拟”是一个未声明的标识符(但仍然可以编译和运行)。 I guess that's an error in IntelliJ?
我猜这是IntelliJ中的错误吗? What purpose do these "ignoreable characters" serve?
这些“不可忽视的角色”的目的是什么?
(Note: StackOverflow seems to remove my control characters from the question) (注意:StackOverflow似乎从问题中删除了我的控制字符)
There is an open issue for this contradiction. 这个矛盾有一个公开的问题 。
In summary, these characters are indeed ignored for identifier name matching by the compiler but JLS doesn't mention this. 总之,编译器的标识符名称匹配确实忽略了这些字符,但是JLS并未提及。 Instead JLS says :
相反, JLS说 :
Two identifiers are the same only if they are identical, that is, have the same Unicode character for each letter or digit.
两个标识符只有在它们相同的情况下才是相同的,也就是说,每个字母或数字具有相同的Unicode字符。
Also 也
A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true
“ Java字母或数字”是一种字符,其方法Character.isJavaIdentifierPart(int)返回true
The contradiction is obvious as: 矛盾显而易见:
Character.isJavaIdentifierPart('\u0001') -> true, so used to compare identifier names
Character.isIdentifierIgnorable('\u0001') -> true, should be ignored actually
I speculate that Intellij IDEA follows the JLS or they are simply unaware of ignorable characters. 我推测Intellij IDEA遵循JLS,或者它们只是不知道可忽略的字符。 I don't see a bug report for this here .
我在这里看不到任何错误报告。
As to what is the purpose of these ignorables, unicode specifies some Layout and Format Control Characters . 至于这些可点火对象的目的是什么,unicode指定了一些布局和格式控制字符 。 It is suggested that these characters should be ignored in identifier names as
建议在标识符名称中忽略这些字符,因为
the effects they represent are stylistic or otherwise out of scope for identifiers, and second because the characters themselves often have no visible display
它们所代表的效果是风格上的,或者超出了标识符的范围,其次是因为字符本身通常没有可见的显示
Apparently the purpose of isIdentifierIgnorable
is to identify characters of this category. 显然,
isIdentifierIgnorable
的目的是识别此类别的字符。 For instance it's mentioned in the isIdentifierIgnorable documentation that it returns true
for characters that have the FORMAT general category value which are characters with unicode General_Category value of Cf which are included in the Layout and Format Control Characters 例如,在isIdentifierIgnorable文档中提到,对于具有FORMAT常规类别值的字符 ,它返回
true
,这些字符是Layout和Format Control Characters中包含Cf的 unicode General_Category值的 字符。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.