简体   繁体   English

将unicode符号用作Java标识符是一个好主意吗?

[英]Is it a good idea to use unicode symbols as Java identifiers?

I have a snippet of code that looks like this: 我有一段代码如下:

double Δt = lastPollTime - pollTime;
double α = 1 - Math.exp(-Δt / τ);
average += α * (x - average);

Just how bad an idea is it to use unicode characters in Java identifiers? 在Java标识符中使用unicode字符的想法有多糟糕? Or is this perfectly acceptable? 或者这完全可以接受?

It's a bad idea, for various reasons. 出于各种原因,这是一个坏主意。

  • Many people's keyboards do not support these characters. 许多人的键盘不支持这些字符。 If I were to maintain that code on a qwerty keyboard (or any other without Greek letters), I'd have to copy and paste those characters all the time. 如果我要在qwerty键盘(或任何其他没有希腊字母)上维护该代码,我将不得不一直复制和粘贴这些字符。

  • Some people's editors or terminals might not display these characters properly. 某些人的编辑或终端可能无法正确显示这些字符。 For example, some editors (unfortunately) still default to some ISO-8859 (Latin) variant. 例如,一些编辑(不幸的是)仍默认为某些ISO-8859(拉丁语)变体。 The main reason why ASCII is still so prevalent is that it nearly always works. ASCII仍然如此普遍的主要原因是它几乎总是有效。

  • Even if the characters can be rendered properly, they may cause confusion. 即使可以正确呈现字符,也可能会造成混淆。 Straight from Sun (emphasis mine): 直接来自太阳 (强调我的):

    Identifiers that have the same external appearance may yet be different. 具有相同外观的标识符可能会有所不同。 For example, the identifiers consisting of the single letters LATIN CAPITAL LETTER A (A, \A), LATIN SMALL LETTER A (a, \a), GREEK CAPITAL LETTER ALPHA (A, \Α), CYRILLIC SMALL LETTER A (a, \а) and MATHEMATICAL BOLD ITALIC SMALL A (a, \?\?) are all different. 例如,标识符包括单个字母LATIN CAPITAL LETTER A(A,\\ u0041),LATIN SMALL LETTER A(a,\\ u0061),GREEK CAPITAL LETTER ALPHA(A,\\ u0391),CYRILLIC SMALL LETTER A(a, \\ u0430)和MATHEMATICAL BOLD ITALIC SMALL A(a,\\ ud835 \\ udc82)都不同。

    ... ...

    Unicode composite characters are different from the decomposed characters. Unicode复合字符与分解的字符不同。 For example, a LATIN CAPITAL LETTER A ACUTE (Á, \Á) could be considered to be the same as a LATIN CAPITAL LETTER A (A, \A) immediately followed by a NON-SPACING ACUTE (´, \́) when sorting, but these are different in identifiers. 例如,拉丁文大写字母A ACUTE(Á,\\ u00c1)可被视为与拉丁文大写字母A(A,\\ u0041)相同,紧接着是非间距急性(',\\ u0301) ,但这些在标识符方面有所不同。

    This is in no way an imaginary problem: α (U+03b1 GREEK SMALL LETTER ALPHA) and ⍺ (U+237a APL FUNCTIONAL SYMBOL ALPHA) are different characters! 这绝不是一个想象中的问题:α(U + 03b1 GREEK SMALL LETTER ALPHA)和⍺(U + 237a APL功能符号ALPHA)是不同的字符!

  • There is no way to tell which characters are valid. 无法确定哪些字符有效。 The characters from your code work, but when I use the FUNCTIONAL SYMBOL ALPHA my Java compiler complains about "illegal character: \\9082". 代码中的字符可以正常工作,但是当我使用FUNCTIONAL SYMBOL ALPHA时,我的Java编译器会抱怨“非法字符:\\ 9082”。 Even though the functional symbol would be more appropriate in this code. 即使功能符号在此代码中更合适。 There seems to be no solid rule about which characters are acceptable, except asking Character.isJavaIdentifierPart() . 除了询问 Character.isJavaIdentifierPart()之外,似乎没有关于哪些字符可以接受的可靠规则。

  • Even though you may get it to compile, it seems doubtful that all Java virtual machine implementations have been rigorously tested with Unicode identifiers. 即使您可以将其编译,但似乎所有Java虚拟机实现都已使用Unicode标识符进行了严格测试。 If these characters are only used for variables in method scope, they should get compiled away, but if they are class members, they will end up in the .class file as well, possibly breaking your program on buggy JVM implementations. 如果这些字符仅用于方法范围中的变量,那么它们应该被编译掉,但如果它们是类成员,它们也将最终出现在.class文件中,可能会在错误的JVM实现上破坏您的程序。

looks good as it uses the correct symbols, but how many of your team will know the keystrokes for those symbols? 看起来不错,因为它使用了正确的符号,但是你的团队中有多少人知道这些符号的按键?

I would use an english representation just to make it easier to type. 我会使用英文表示,以便更容易输入。 And others might not have a character set that supports those symbols set up on their pc. 而其他人可能没有支持在他们的电脑上设置这些符号的字符集。

It is perfectly acceptable if it is acceptable in your working group. 如果您的工作组可以接受,那是完全可以接受的。 A lot of the answers here operate on the arrogant assumption that everybody programs in English. 这里的很多答案都是以傲慢的假设运作,即每个人都用英语编程。 Non-English programmers are by no means rare these days and they're getting less rare at an accelerating rate. 非英语程序员这些日子并不罕见,而且他们的加速率也越来越少。 Why should they restrict themselves to English versions when they have a perfectly good language at their disposal? 当他们拥有完美的语言时,他们为什么要限制自己使用英语版本?

Anglophone arrogance aside, there are other legitimate reasons for using non-English identifiers. 除了英语的傲慢,还有其他正当理由使用非英语标识符。 If you're writing mathematics packages, for example, using Greek is fine if your target is fellow mathematicians. 例如,如果您正在编写数学软件包,那么如果您的目标是数学家,那么使用希腊文就可以了。 Why should people type out "delta" in your workgroup when everybody can understand "Δ" and likely type it more quickly? 当每个人都能理解“Δ”并且可能更快地输入时,为什么人们会在你的工作组中输入“delta”? Almost any problem domain will have its own jargon and sometimes that jargon is expressed in something other than the Latin alphabet. 几乎任何问题领域都有自己的行话,有时候行话用拉丁字母表示。 Why on Earth would you want to try and jam everything into ASCII? 为什么在地球上你想尝试将所有东西都塞进ASCII?

That code is fine to read, but horrible to maintain - I suggest use plain English identifiers like so: 这段代码可以阅读,但维护起来很糟糕 - 我建议使用简单的英文标识符,如下所示:

double deltaTime = lastPollTime - pollTime;
double alpha = 1 - Math.exp(-delta....

It's an excellent idea. 这是个好主意。 Honest. 诚实。 It's just not easily practicable at the time . 在当时并不容易实现。 Let's keep a reference to it for the future. 让我们继续参考它。 I would love to see triangles, circles, squares, etc... as part of program code. 很想看到三角形,圆形,正方形等...作为程序代码的一部分。 But for now, please do try to re-write it, the way Crozin suggests. 但就目前而言,请尝试重新编写它,就像Crozin所说的那样。

Why not? 为什么不? If the people working on that code can type those easily, it's acceptable. 如果处理该代码的人可以轻松输入,那么这是可以接受的。

But god help those who can't display unicode, or who can't type them. 但上帝帮助那些无法显示unicode或无法输入unicode的人。

In a perfect world, this would be the recommended way. 在完美的世界中,这将是推荐的方式。

Unfortunately you run into character encodings when moving outside of plain 7-bit ASCII characters (UTF-8 is different from ISO-Latin-1 is different from UTF-16 etc), meaning that you eventually will run into problems. 不幸的是,当你移动到普通的7位ASCII字符(UTF-8不同于ISO-Latin-1与UTF-16等不同)时,你会遇到字符编码,这意味着你最终会遇到问题。 This has happened to me when moving from Windows to Linux. 从Windows迁移到Linux时,这种情况发生在我身上。 Our national scandinavian characters broke in the process, but fortunately was only in strings. 我们的国家斯堪的纳维亚人物在这个过程中破裂了,但幸运的是只有弦乐。 We then used the \\u encoding for all those. 然后我们使用\\ u编码来表示所有这些。

If you can be absolutely certain that you will never, ever run into such a thing - for instance if your files contain a proper BOM - then by all means, do this. 如果您完全可以确定您永远不会遇到这样的事情 - 例如,如果您的文件包含适当的BOM - 那么无论如何都要这样做。 It will make your code more readable. 它将使您的代码更具可读性。 If at least the smallest amount of doubt, then don't. 如果至少有最小的疑问,那就不要了。

(Please note that the "use non-English languages" is a different matter. I'm just thinking in using symbols instead of letters). (请注意,“使用非英语语言”是另一回事。我只是想用符号代替字母)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM