简体   繁体   English

UTF-8字符中的字符常量无效

[英]Invalid character constant in a UTF-8 character

I'm trying to assign 'o͝' (a phonetic character) to a Character in a Java program, but I get the error "Invalid character constant". 我试图在Java程序中将'o͝'(语音字符)分配给Character,但是出现错误“ Invalid character constant”。 My file is using UTF-8 and other phonetic characters work ok, but not this one. 我的文件使用的是UTF-8,其他语音字符也可以正常工作,但不是这个。 It looks as if this character is, in fact, two (an 'o' and a ligature or something like that), but I can not break it is forming parts. 看起来这个字符实际上是两个(“ o”和一个连字或类似的东西),但我不能破坏它正在形成的部分。

Code example: 代码示例:

Character test = 'o͝';

Any help would be appreciated. 任何帮助,将不胜感激。

The glyph is called "small letter o with combining double breve " and can, in source, be written as; 该字形被称为“ 结合了双短号的小写字母o”,从源头上可以写为:

String a = "\u006f\u035d";

Since it is a combining character (ie two characters ), the resulting value cannot be assigned to a single Java char, you'll need to use a String. 由于它是一个组合字符(即, 两个字符 ),因此无法将结果值分配给单个Java字符,因此需要使用String。

您可以尝试在字符表上查找字符号,并将其分配给变量,例如:

char a = '\u0040';

As already said, you shouldn't hardcode characters like that, you should use the unicode point values found here: 如前所述,您不应该对这样的字符进行硬编码,而应该使用在此处找到的unicode点值:

http://www.utf8-chartable.de/ http://www.utf8-chartable.de/

What you want actually involves a "combining character": 您想要的实际上涉及一个“组合字符”:

http://en.wikipedia.org/wiki/Combining_character http://en.wikipedia.org/wiki/Combining_character

The combining diacritical marks are 0x0300 - 0x036f. 组合的变音标记为0x0300-0x036f。 So, eg, to create the character you want ('o' with double breve), use: 因此,例如,要创建所需的字符(双斜线“ o”),请使用:

String o_doubleBreve = "o\u035d";

Prints as o͝ 打印为o͝

I agree with the above answers that giving the \\u representation is best in any new code you happen to write, however one will come across projects with source code having this issue and supposedly they were able to compile their code. 我同意上面的回答,在您碰巧编写的任何新代码中,最好都使用\\ u表示,但是会遇到带有此问题的源代码的项目,并且据说它们能够编译其代码。 One such example I am working with now is openNLP . 我现在正在使用的这样一个示例是openNLP

Well if you run into something like this, you see that when running in an IDE like Eclipse if you follow a procedure like this , you can change the workspace default representation to be UTF-8. 那么,如果你遇到了这样的事情,你看到的是,在像Eclipse的IDE中运行时,如果你遵循类似的程序这样 ,您可以更改工作区默认表示为UTF-8。 This will allow successful compiling of the code. 这样可以成功编译代码。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM