简体   繁体   English

“编码 UTF-8 的不可映射字符”错误

[英]"Unmappable character for encoding UTF-8" error

I'm getting a compile error at the following method.我在以下方法中遇到编译错误。

public static boolean isValidPasswd(String passwd) {
    String reg = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$";
    return Pattern.matches(reg, passwd);
}
at Utility.java:[76,74] unmappable character for 
enoding UTF-8. 74th character is' " '

How can I fix this?我怎样才能解决这个问题? Thanks.谢谢。

You have encoding problem with your sourcecode file. 您的源代码文件存在编码问题。 It is maybe ISO-8859-1 encoded, but the compiler was set to use UTF-8. 它可能是ISO-8859-1编码的,但是编译器设置为使用UTF-8。 This will results in errors when using characters, which will not have the same bytes representation in UTF-8 and ISO-8859-1. 使用字符时将导致错误,这些字符在UTF-8和ISO-8859-1中的字节表示形式将不同。 This will happen to all characters which are not part of ASCII, for example ¬ NOT SIGN . 这将发生在不属于ASCII的所有字符中,例如¬ SIGN

You can simulate this with the following program. 您可以使用以下程序对此进行模拟。 It just uses your line of source code and generates a ISO-8859-1 byte array and decode this "wrong" with UTF-8 encoding. 它仅使用您的源代码行,并生成一个ISO-8859-1字节数组,并使用UTF-8编码对该“错误”进行解码。 You can see at which position the line gets corrupted. 您可以看到线路损坏的位置。 I added 2 spaces at your source code to fit position 74 to fit this to ¬ NOT SIGN , which is the only character, which will generate different bytes in ISO-8859-1 encoding and UTF-8 encoding. 我在您的源代码处添加了2个空格以适合位置74,以使其适合¬ SIGN ,后者是唯一的字符,它将以ISO-8859-1编码和UTF-8编码生成不同的字节。 I guess this will match indentation with the real source file. 我想这将使缩进与实际源文件匹配。

 String reg = "      String reg = \"^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¬.,-])(?=[^\\s]+$).{8,24}$\";";
 String corrupt=new String(reg.getBytes("ISO-8859-1"),"UTF-8");
 System.out.println(corrupt+": "+corrupt.charAt(74));
 System.out.println(reg+": "+reg.charAt(74));     

which results in the following output (messed up because of markup): 这将导致以下输出(由于标记而混乱):

String reg = "^(?=. [0-9])(?=. [az])(?=. [AZ])(?=. [~#;:?/@&!"'%*= .,-])(?=[^\\s]+$).{8,24}$";: 字符串reg =“ ^(?=。 [0-9])(?=。 [az])(?=。 [AZ])(?=。 [〜#;:?/ @&!”'%* = …。,-])(?= [^ \\ s] + $)。{8,24} $“ ;:。

String reg = "^(?=. [0-9])(?=. [az])(?=. [AZ])(?=. [~#;:?/@&!"'%*=¬.,-])(?=[^\\s]+$).{8,24}$";: ¬ 字符串reg =“ ^(?=。 [0-9])(?=。 [az])(?=。 [AZ])(?=。 [〜#;:?/ @&!”'%* = ¬。,-])(?= [^ \\ s] + $)。{8,24} $“ ;:¬

See "live" at https://ideone.com/ShZnB 参见https://ideone.com/ShZnB上的 “实时”

To fix this, save the source files with UTF-8 encoding. 要解决此问题,请使用UTF-8编码保存源文件。

I'm in the process of setting up a CI build server on a Linux box for a legacy system started in 2000. There is a section that generates a PDF that contains non-UTF8 characters. 我正在为从2000年开始的旧系统在Linux机器上设置CI构建服务器。其中有一个部分可生成包含非UTF8字符的PDF。 We are in the final steps of a release, so I cannot replace the characters giving me grief, yet for Dilbertesque reasons, I cannot wait a week to solve this issue after the release. 我们处于发行的最后阶段,因此我不能替换给我带来痛苦的角色,但是由于Dilbertesque的原因,我迫不及待等待一个星期来解决发行后的问题。 Fortunately, the "javac" command in Ant has an "encoding" parameter. 幸运的是,Ant中的“ javac”命令具有“ encoding”参数。

 <javac destdir="${classes.dir}" classpathref="production-classpath" debug="on"
     includeantruntime="false" source="${java.level}" target="${java.level}"

     encoding="iso-8859-1">

     <src path="${production.dir}" />
 </javac>

The Java compiler assumes that your input is UTF-8 encoded, either because you specified it to be or because it's your platform default encoding. Java编译器假定您的输入是UTF-8编码的,或者是因为您将输入指定为,还是因为它是平台的默认编码。

However, the data in your .java files is not actually encoded in UTF-8. 但是, .java文件中的数据实际上并未以UTF-8编码。 The problem is probably the ¬ character. 这个问题可能是¬字符。 Make sure your editor (or IDE) of choice actually safes its file in UTF-8 encoding. 确保选择的编辑器(或IDE)实际上以UTF-8编码保护其文件安全。

In eclipse try to go to file properties ( Alt + Enter ) and change the Resource → ' Text File encoding ' → Other to UTF-8 . 在Eclipse中,尝试转到文件属性( Alt + Enter ),然后将“ Resource →“ Text File encoding ”→“ Other更改为UTF-8 Reopen the file and check there will be junk character somewhere in the string/file. 重新打开文件,并检查字符串/文件中某处是否有垃圾字符。 Remove it. 去掉它。 Save the file. 保存文件。

Change the encoding Resource → ' Text File encoding ' back to Default. 将编码资源→“ Text File encoding ”更改回默认值。

Compile and deploy the code. 编译并部署代码。

Thanks Michael Konietzka ( https://stackoverflow.com/a/4996583/1019307 ) for your answer. 感谢Michael Konietzka( https://stackoverflow.com/a/4996583/1019307 )的回答。

I did this in Eclipse / STS: 我在Eclipse / STS中做到了:

Preferences > General > Content Types > Selected "Text" 
    (which contains all types such as CSS, Java Source Files, ...)
Added "UTF-8" to the default encoding box down the bottom and hit 'Add'

Bingo, error gone! 宾果游戏,错误消失了!

For IntelliJ users, this is pretty easy once you find out what the original encoding was. 对于IntelliJ用户,一旦找到原始编码,这将非常容易。 You can select the encoding from the bottom right corner of your Window, you will be prompted with a dialog box saying: 您可以从窗口的右下角选择编码,然后会出现一个对话框提示您:

The encoding you've chosen ('[encoding type]') may change the contents of '[Your file]'. 您选择的编码(“ [编码类型]”)可能会更改“ [您的文件]”的内容。 Do you want to reload the file from disk or convert the text and save in the new encoding? 您要从磁盘重新加载文件还是转换文本并保存为新的编码?

So if you happen to have a few characters saved in some odd encoding, what you should do is first select 'Reload' to load the file all in the encoding of the bad characters. 因此,如果您碰巧以某种奇数编码保存了一些字符,则应该首先选择“重新加载”以使用不良字符的编码全部加载文件。 For me this turned the ? 对我来说,这变成了? characters into their proper value. 字符变成其应有的价值。

IntelliJ can tell if you most likely did not pick the right encoding and will warn you. IntelliJ可以告诉您您是否最有可能没有选择正确的编码并会警告您。 Revert back and try again. 还原并重试。

Once you can see the bad characters go away, change the encoding select box in the bottom right corner back to the format you originally intended (if you are Googling this error message, that will likely be UTF-8). 一旦看到坏字符消失,将右下角的编码选择框改回原来的格式(如果您正在Google搜索此错误消息,则可能是UTF-8)。 This time select the 'Convert' button on the dialog. 这次选择对话框上的“转换”按钮。

For me, I needed to reload as 'windows-1252', then convert back to 'UTF-8'. 对我来说,我需要重新加载为“ windows-1252”,然后再转换回“ UTF-8”。 The offending characters were single quotes (' and ') likely pasted in from a Word doc (or e-mail) with the wrong encoding, and the above actions will convert them to UTF-8. 令人反感的字符可能是从Word文档(或电子邮件)以错误的编码粘贴的单引号('和'),并且上述操作会将其转换为UTF-8。

The compiler is using the UTF-8 character encoding to read your source file. 编译器正在使用UTF-8字符编码来读取您的源文件。 But the file must have been written by an editor using a different encoding. 但是文件必须由编辑器使用其他编码编写。 Open your file in an editor set to the UTF-8 encoding, fix the quote mark, and save it again. 在设置为UTF-8编码的编辑器中打开文件,修复引号,然后再次保存。

Alternatively, you can find the Unicode point for the character and use a Unicode escape in the source code. 或者,您可以找到字符的Unicode点,并在源代码中使用Unicode转义。 For example, the character A can be replaced with the Unicode escape \A . 例如,字符A可以替换为Unicode转义\A

By the way, you don't need to use the begin- and end-line anchors ^ and $ when using the matches() method. 顺便说一句,在使用matches()方法时,您不需要使用开始和结束行锚^$ The entire sequence must be matched by the regular expression when using the matches() method. 使用matches()方法时,整个序列必须由正则表达式matches() The anchors are only useful with the find() method. 锚点仅对find()方法有用。

"error: unmappable character for encoding UTF-8" means, java has found a character which is not representing in UTF-8. “错误:编码UTF-8的不可映射字符”表示Java找到了一个未在UTF-8中表示的字符。 Hence open the file in an editor and set the character encoding to UTF-8. 因此,在编辑器中打开文件,并将字符编码设置为UTF-8。 You should be able to find a character which is not represented in UTF-8.Take off this character and recompile. 您应该能够找到UTF-8中未表示的字符。请脱下该字符并重新编译。

The following compiles for me: 以下为我编译:

class E{
   String s = "^(?=.*[0-9])(?=.*[a-z])(?=.*[A-Z])(?=.*[~#;:?/@&!\"'%*=¼.,-])(?=[^\\s]+$).{8,24}$";
}

See: 看到:

在此处输入图片说明

I observed this issue while using Eclipse. 我在使用Eclipse时观察到了此问题。 I needed to add encoding in my pom.xml file and it resolved. 我需要在pom.xml文件中添加编码,然后解析。 http://ctrlaltsolve.blogspot.in/2015/11/encoding-properties-in-maven.html http://ctrlaltsolve.blogspot.in/2015/11/encoding-properties-in-maven.html

I had the similar issue and I fix with the down corner of my IntelliJ.我有类似的问题,我用我的 IntelliJ 的下角修复了。

I changed it from LF to CRLF .我将其从LF更改为CRLF

Here is how it looks the down corner of the IntelliJ:这是 IntelliJ 下角的样子:

IntelliJ_图像

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM